MinerU is designed as a multi-backend document parsing framework that transforms raw PDF bytes, images, or Office documents into structured Markdown and JSON. The system abstracts the complexity of different AI models and OCR engines through a unified orchestration layer, allowing users to choose between speed, accuracy, and local vs. remote computing power.
The data flow in MinerU typically follows a four-stage process:
read_fn, which handles PDF, various image formats, and office documents mineru/cli/common.py171-184backend parameter. For programmatic use, do_parse or aio_do_parse act as the primary entry points.model_output.middle_json format, which is then processed by union_make to generate final Markdown or Content Lists mineru/cli/common.py24-25The following diagram illustrates how CLI/API calls translate into internal code entities and flow through the backends.
MinerU Execution Path
Sources: mineru/cli/client.py179-183 mineru/cli/common.py24-30 mineru/cli/gradio_app.py72-133 mineru/cli/gradio_app.py54
MinerU provides three distinct architectural paths for document parsing, each optimized for different hardware and accuracy requirements.
The Pipeline Backend is the "traditional" approach. It utilizes a series of specialized small models: YOLO-based layout detection (PP-DocLayoutV2), specialized models for formula detection (MFD) and recognition (MFR), and an OCR engine for text extraction. It supports streaming processing via window-based batching.
doc_analyze_streaming mineru/backend/pipeline/pipeline_analyze.py157-166 which manages a processing window to handle multi-file batches efficiently mineru/backend/pipeline/pipeline_analyze.py207-212 It uses a ModelSingleton to manage lifecycle for layout and OCR models mineru/backend/pipeline/pipeline_analyze.py33-59append_batch_results_to_middle_json mineru/backend/pipeline/model_json_to_middle_json.py107-117The VLM Backend leverages Vision-Language Models (like MinerU2.5-Pro) to perform end-to-end document understanding. It supports multiple inference engines including vllm, lmdeploy, mlx, sglang, and transformers.
vllm, lmdeploy) mineru/utils/engine_utils.py20 It uses vlm_doc_analyze as the primary entry point for document analysis mineru/cli/common.py26-27The Hybrid Backend combines the structural layout understanding of VLMs with the high-precision OCR and formula recognition of the pipeline models. It uses the VLM to define document structure and then "fills" the content using specialized models for text and formulas.
_load_hybrid_analyze_entrypoint which dynamically loads the hybrid module to ensure local dependencies like torch are present mineru/cli/common.py76-87 It requires mineru[pipeline] dependencies to function mineru/cli/common.py60-66middle_json)Regardless of the backend used, all data is eventually transformed into a common schema known as middle_json. This format decouples model-specific output from the final document generation logic.
Data Flow to Output
Sources: mineru/cli/common.py24-30 mineru/backend/pipeline/model_json_to_middle_json.py72-81 mineru/backend/pipeline/pipeline_analyze.py12-17
| Component | Responsibility |
|---|---|
middle_json | The internal intermediate representation that standardizes all model outputs mineru/cli/common.py24-25 |
union_make | The final assembly function that handles language-aware text merging and formatting for different modes (Markdown, Content List V1/V2) mineru/cli/common.py24-25 |
MakeMode | Defines output targets: MM_MD, CONTENT_LIST, CONTENT_LIST_V2 mineru/cli/common.py21 |
For details, see middle_json Format & Content Generation.
The system is exposed via several entry points that all converge on the do_parse or aio_do_parse logic.
LiveTaskStatusRenderer mineru/cli/client.py179-183 It handles task planning and unique naming for output files via uniquify_task_stems mineru/cli/client.py134-168/tasks endpoint. It supports asynchronous task submission and utilizes the core parsing logic.ReusableLocalAPIServer mineru/cli/gradio_app.py54 and a concurrency limiter GradioRequestConcurrencyLimiter mineru/cli/gradio_app.py72-133regenerate_client_side_outputs function allows re-rendering Markdown/JSON from an existing middle_json without re-running heavy inference mineru/cli/client.py51Sources: mineru/cli/client.py134-168 mineru/cli/gradio_app.py54-133 mineru/cli/common.py110-118 mineru/cli/client.py51