This glossary defines the core technical terms, data structures, and domain-specific concepts used within the MinerU codebase, providing a bridge between conceptual documentation and the underlying implementation.
The traditional extraction architecture that uses a sequence of specialized small models. It coordinates layout detection, formula recognition, and OCR to reconstruct the document.
BatchAnalyze mineru/backend/pipeline/batch_analyze.py52-81 to process document windows and MagicModel (Pipeline version) for structure reconstruction mineru/backend/pipeline/pipeline_magic_model.py17-127LAYOUT_BASE_BATCH_SIZE (1), MFR_BASE_BATCH_SIZE (16), and OCR_DET_BASE_BATCH_SIZE (8) mineru/backend/pipeline/batch_analyze.py38-40AtomModelSingleton to ensure models like OCR and MFR are loaded only once and shared across the pipeline mineru/backend/pipeline/model_init.py148-188A modern extraction architecture that leverages Vision-Language Models (e.g., MinerU2.5-Pro) to perform end-to-end document understanding.
MagicModel (VLM version) to convert raw VLM output blocks into a structured hierarchy mineru/backend/vlm/vlm_magic_model.py29-184BlockType and ContentType during the union_make process mineru/backend/vlm/vlm_middle_json_mkcontent.py162-232merge_para_with_text to join spans while handling hyphens and full-to-half width conversion mineru/backend/vlm/vlm_middle_json_mkcontent.py146An advanced architecture that combines VLM-based layout understanding with specialized pipeline models (OCR, MFR) to achieve high accuracy.
hybrid_analyze mineru/backend/hybrid/hybrid_analyze.py30-36medium and high efforts, where medium effort forces image analysis off to maintain a fast path mineru/backend/hybrid/hybrid_analyze.py110-122run_layout_inference, run_mfr_inference, and run_ocr_inference mineru/backend/pipeline/model_init.py41-60Specialized processing for Office documents (.docx, .pptx, .xlsx).
middle_json using tools like DocxConverter mineru/model/docx/docx_converter.py43 mammoth pyproject.toml56 and python-docx pyproject.toml54The standardized intermediate representation used by MinerU. All backends convert raw model outputs into this schema before final Markdown/JSON generation.
pdf_info, _backend, _effort, and _version_name mineru/backend/hybrid/hybrid_model_output_to_middle_json.py181-190Sources: mineru/backend/pipeline/batch_analyze.py38-81 mineru/backend/hybrid/hybrid_analyze.py110-122 mineru/backend/pipeline/model_init.py41-188 mineru/backend/vlm/vlm_middle_json_mkcontent.py146-232 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py181-190 mineru/backend/vlm/vlm_magic_model.py29-184
The following diagram illustrates the relationship between input types, processing backends, and the unified output generation.
Sources: mineru/backend/pipeline/batch_analyze.py52 mineru/backend/hybrid/hybrid_analyze.py30 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py181 mineru/utils/enum_class.py89-93 mineru/backend/vlm/vlm_analyze.py30
The process of locating mathematical formulas within a page image.
formula_enable flag in BatchAnalyze mineru/backend/pipeline/batch_analyze.py66The process of converting detected formula images into LaTeX strings.
unimernet_small and pp_formulanet_plus_m mineru/backend/pipeline/model_init.py114-123run_mfr_inference to handle resource contention mineru/backend/pipeline/model_init.py49-53The classification of page regions into categories such as text, title, figure, table, or formula.
BlockType mineru/utils/enum_class.py4-50 including specialized types like abstract, doc_title, and vertical_text.MEDIUM_EFFORT_LAYOUT_LABEL_TO_VLM_TYPE maps pipeline layout labels to VLM types for the hybrid backend mineru/backend/hybrid/hybrid_analyze.py83-107The logic used to sort detected layout blocks into a human-readable sequence.
index property in middle_json mineru/backend/hybrid/hybrid_model_output_to_middle_json.py116The process of identifying table structure (rows, columns, cells).
slanet_plus (Wireless) and unet_structure (Wired) mineru/utils/enum_class.py105-106MineruTableOrientationClsModel mineru/backend/pipeline/model_init.py81 handles table rotation detection.cross_page_table_merge during middle_json finalization mineru/backend/hybrid/hybrid_model_output_to_middle_json.py16MinerU uses singleton patterns to manage heavy ML models in memory, ensuring thread-safe initialization and resource sharing.
| Class | Purpose | File Pointer |
|---|---|---|
AtomModelSingleton | Manages pipeline models (OCR, MFD, MFR, Layout) | mineru/backend/pipeline/model_init.py148-188 |
PytorchPaddleOCR | Port of PaddleOCR to PyTorch for character detection/recognition | mineru/model/ocr/pytorch_paddle.py |
HybridModelSingleton | Singleton wrapper for hybrid-specific pipeline models | mineru/backend/pipeline/model_init.py22 |
ModelSingleton | Lifecycle management for VLM backend models | mineru/backend/vlm/vlm_analyze.py31 |
Final output generation is handled by union_make functions that transform middle_json into target formats.
| Entity | Role | File Pointer |
|---|---|---|
union_make (VLM) | Renders Markdown and JSON from VLM outputs | mineru/backend/vlm/vlm_middle_json_mkcontent.py |
merge_para_with_text | Core utility for joining spans into paragraphs with hyphen handling | mineru/backend/vlm/vlm_middle_json_mkcontent.py146 |
blocks_to_page_info | Converts MagicModel blocks into middle_json page structures | mineru/backend/hybrid/hybrid_model_output_to_middle_json.py52-124 |
The following diagram maps user-facing interfaces to the underlying orchestration and inference layers.
Sources: pyproject.toml128-136 mineru/backend/pipeline/batch_analyze.py52 mineru/backend/hybrid/hybrid_analyze.py30
Sources: mineru/utils/enum_class.py1-134 mineru/backend/pipeline/batch_analyze.py38-40
Refresh this wiki