MinerU utilizes a collection of specialized machine learning models to perform document decomposition. These models are managed as atomic units and orchestrated by the pipeline to transform raw page images into structured data. This page provides a high-level overview of these subsystems and their roles within the MinerU architecture.
The lifecycle of these models is managed by the AtomModelSingleton mineru/backend/pipeline/model_init.py148-157 which ensures that heavy model weights are loaded into memory only once and shared across processing tasks using a thread-safe implementation mineru/backend/pipeline/model_init.py184-187
The following diagram illustrates how the different model subsystems interact during the processing of a single document page within the BatchAnalyze orchestration layer mineru/backend/pipeline/batch_analyze.py52-81
Sources: mineru/backend/pipeline/batch_analyze.py52-81 mineru/backend/pipeline/model_init.py189-220
This diagram maps system functions to their specific implementation classes and file paths within the codebase, as resolved by the AtomModelSingleton mineru/backend/pipeline/model_init.py159-187
Sources: mineru/backend/pipeline/model_init.py159-220 mineru/backend/pipeline/model_list.py2-10 mineru/model/table/cls/mineru_table_ori_cls.py25-27
The layout detection subsystem identifies functional regions such as text blocks, titles, figures, tables, and formulas.
PPDocLayoutV2LayoutModel, which categorizes regions into functional labels like doc_title, table, and display_formula.pp_doclayout_v2_model_init mineru/backend/pipeline/model_init.py126-130_prune_empty_ocr_text_blocks mineru/backend/pipeline/batch_analyze.py166-180PIPELINE_LAYOUT_INFERENCE_LOCK mineru/backend/pipeline/model_init.py24 when enabled via environment variables mineru/backend/pipeline/model_init.py28-30For details, see Layout Detection & Reading Order.
MinerU employs a dual-path recognition strategy to handle the structural diversity of tables.
MineruTableOrientationClsModel mineru/backend/pipeline/model_init.py72-82 It uses OCR scores to determine the final rotation angle (0, 90, or 270 degrees) mineru/model/table/cls/mineru_table_ori_cls.py22PaddleTableClsModel determines if a table is WiredTable (has visible grid lines) or WirelessTable.UnetTableModel mineru/backend/pipeline/model_init.py98 which utilizes a UNet-based architecture.PaddleTableModel mineru/backend/pipeline/model_init.py111 utilizing the SlanetPlus architecture.For details, see Table Recognition.
The OCR subsystem handles text extraction from non-digitized PDF regions and image-based content.
PytorchPaddleOCR, initialized via ocr_model_init mineru/backend/pipeline/model_init.py133-145OCR_DET_BASE_BATCH_SIZE mineru/backend/pipeline/batch_analyze.py40mask_formula_regions_for_ocr_det mineru/backend/pipeline/batch_analyze.py88For details, see OCR Engine.
Mathematical content is handled by a specialized two-stage pipeline: Mathematical Formula Detection (MFD) and Mathematical Formula Recognition (MFR).
mfr_model_init mineru/backend/pipeline/model_init.py115-123:
UnimernetModel: The default model for general formula recognition mineru/backend/pipeline/model_init.py117FormulaRecognizer: PP-FormulaNet-Plus-M, used when Chinese formula support is enabled mineru/backend/pipeline/model_init.py119For details, see Formula Recognition (MFD/MFR).
Models are initialized through a centralized factory pattern in atom_model_init mineru/backend/pipeline/model_init.py189-220 to manage device placement and lifecycle.
| Model Category | Class Name | Initialization Function |
|---|---|---|
| Layout | PPDocLayoutV2LayoutModel | pp_doclayout_v2_model_init mineru/backend/pipeline/model_init.py126 |
| MFR | UnimernetModel / FormulaRecognizer | mfr_model_init mineru/backend/pipeline/model_init.py115 |
| OCR | PytorchPaddleOCR | ocr_model_init mineru/backend/pipeline/model_init.py133 |
| Table Cls | PaddleTableClsModel | table_cls_model_init mineru/backend/pipeline/model_init.py85 |
| Wired Table | UnetTableModel | wired_table_model_init mineru/backend/pipeline/model_init.py89 |
| Wireless Table | PaddleTableModel | wireless_table_model_init mineru/backend/pipeline/model_init.py102 |
| Orientation | MineruTableOrientationClsModel | table_orientation_cls_model_init mineru/backend/pipeline/model_init.py72 |
Sources: mineru/backend/pipeline/model_init.py1-220 mineru/backend/pipeline/batch_analyze.py38-50 mineru/backend/pipeline/model_list.py1-11 mineru/model/table/cls/mineru_table_ori_cls.py13-22