doc page extractor

English | 中文

Introduction

doc page extractor can identify text and format in images and return structured data.

Installation

pip install doc-page-extractor

pip install onnxruntime==1.21.0

Using CUDA

Please refer to the introduction of PyTorch and select the appropriate command to install according to your operating system.

In addition, replace the command to install onnxruntime in the previous article with the following:

pip install onnxruntime-gpu==1.21.0

Example

from PIL import Image
from doc_page_extractor import DocExtractor

extractor = DocExtractor(
  model_dir_path=model_path, # Folder address where AI model is downloaded and installed
  device="cpu", # If you want to use CUDA, please change to device="cuda".
)
with Image.open("/path/to/your/image.png") as image:
  result = extractor.extract(
  image=image,
  lang="ch", # Language of image text
)
for layout in result.layouts:
  for fragment in layout.fragments:
    print(fragment.rect, fragment.text)

Acknowledgements

The code of doc_page_extractor/onnxocr in this repo comes from OnnxOCR.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
.vscode		.vscode
doc_page_extractor		doc_page_extractor
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
README_zh-CN.md		README_zh-CN.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

doc page extractor

Introduction

Installation

Using CUDA

Example

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

LiZhenzhuBlog/doc-page-extractor

Folders and files

Latest commit

History

Repository files navigation

doc page extractor

Introduction

Installation

Using CUDA

Example

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages