MinerU is a high-performance tool designed to convert complex PDF documents and images into structured, machine-readable Markdown and JSON formats. It is specifically optimized for high-quality data extraction to support Large Language Model (LLM) training and Retrieval-Augmented Generation (RAG) pipelines.
The system is defined as a "practical document parsing tool for converting PDF, images, DOCX, PPTX, and XLSX into Markdown and JSON" pyproject.toml10 supporting a wide range of document elements including multi-column layouts, mathematical formulas (LaTeX), tables, and multi-language text. The current version is 3.4.0 mineru/version.py1
MinerU employs a multi-backend architecture that allows users to balance between speed, accuracy, and hardware availability. The orchestration is primarily handled by the do_parse and aio_do_parse functions, which dispatch tasks to specific backends based on the configuration.
| Backend | Code Identifier | Description |
|---|---|---|
| Pipeline | pipeline | A traditional multi-model pipeline using layout detection, OCR, and MFD/MFR. |
| VLM | vlm-auto-engine | High accuracy via local Vision-Language Models (e.g., Qwen2-VL). |
| Hybrid | hybrid-auto-engine | Combines VLM layout understanding with traditional OCR/Formula models for high-fidelity reconstruction. |
| HTTP Client | *-http-client | Offloads inference to remote OpenAI-compatible or VLM servers. |
| Office | docx, pptx, xlsx | Native conversion for Office documents bypassing PDF rendering for ~10x speedup. |
Sources: pyproject.toml74-118 README.md31-49 README_zh-CN.md31-49
The following diagram illustrates how the system transitions from the CLI entry point to the specialized backend engines and finally to the structured output.
MinerU Dispatch Architecture
Sources: pyproject.toml128-135 README.md10-25 README_zh-CN.md31-49
fast-langdetect and language-specific OCR engines pyproject.toml49vllm, lmdeploy, mlx, and transformers backends for vision understanding pyproject.toml74-89MinerU bridges the gap between raw document pixels/bytes and structured data through a series of specialized modules and model singletons that manage resource lifecycle.
Code Entity Mapping
Sources: pyproject.toml107-118 README.md10-25
MinerU provides several ways to interact with the engine:
mineru command for batch processing pyproject.toml129mineru-api) for remote integration and async task management pyproject.toml134mineru-router) for managing multiple worker instances pyproject.toml135mineru-gradio) for interactive use and visualization pyproject.toml136do_parse or aio_do_parse for programmatic control.mineru-vllm-server, mineru-lmdeploy-server, and mineru-openai-server pyproject.toml130-132mineru-models-download for automated model weight acquisition pyproject.toml133For details on setting up the environment and running your first conversion, see Getting Started & Installation. For a complete list of configuration options and environment variables, see Configuration Reference.
Sources: pyproject.toml128-137 README.md1-25 mineru/version.py1 README_zh-CN.md1-30