This page documents the web-based entry points for MinerU: the FastAPI server, the mineru-router load balancer, and the Gradio web interface. These interfaces serve as high-level wrappers around the asynchronous parsing pipeline, enabling remote access, multi-GPU orchestration, and interactive document conversion.
MinerU provides three primary web-based modes of operation:
mineru-api): A high-performance REST API designed for programmatic integration and batch processing mineru/cli/fast_api.py212-221mineru-router): A load balancer that aggregates multiple mineru-api upstreams or manages local GPU workers mineru/cli/router.py21-23mineru-gradio): A user-friendly web UI for interactive document uploading, real-time previewing, and result downloading mineru/cli/gradio_app.py19-21All interfaces rely on the same underlying asynchronous orchestration logic provided by aio_do_parse mineru/cli/common.py35
The following diagram illustrates how the web interfaces interact with the core parsing engine and the task management system.
Web Interface to Code Entity Mapping
Sources: mineru/cli/fast_api.py142-171 mineru/cli/router.py21-23 mineru/cli/common.py35 mineru/cli/common.py24-30 mineru/cli/api_client.py54
mineru-api)The FastAPI server (fast_api.py) is the core service component. It manages a task queue and provides endpoints for document processing.
| Endpoint | Method | Description |
|---|---|---|
/tasks | POST | Asynchronous submission via submit_async_task. Returns a task_id immediately mineru/cli/fast_api.py355-375 |
/tasks/{task_id}/status | GET | Returns AsyncParseTask state (pending, processing, completed, failed) mineru/cli/fast_api.py78-81 |
/tasks/{task_id}/result | GET | Downloads the final ZIP result if the task is terminal mineru/cli/fast_api.py420-442 |
/file_parse | POST | Legacy synchronous endpoint using do_file_parse mineru/cli/fast_api.py292-301 |
/health | GET | Returns server metadata and API_PROTOCOL_VERSION mineru/cli/fast_api.py279-288 |
AsyncParseTaskThe server tracks requests using the AsyncParseTask dataclass mineru/cli/fast_api.py142-171
DEFAULT_TASK_RETENTION_SECONDS (24h) mineru/cli/fast_api.py85cleanup_expired_tasks runs at intervals defined by DEFAULT_TASK_CLEANUP_INTERVAL_SECONDS mineru/cli/fast_api.py86asyncio.Semaphore (_request_semaphore) to limit active processing mineru/cli/fast_api.py96pending state. The to_status_payload method includes queued_ahead to inform the client of their position mineru/cli/fast_api.py173-196Sources: mineru/cli/fast_api.py78-104 mineru/cli/fast_api.py142-196 mineru/cli/fast_api.py355-442 mineru/cli/api_protocol.py2-4
mineru-router)The mineru-router acts as a reverse proxy and load balancer for multiple mineru-api instances.
LOCAL_GPU_AUTO mineru/cli/router.py70mineru-api upstreams or remote sources via upstream_urls mineru/cli/router.py68-69WorkerPool periodically refreshes worker status and monitors UPSTREAM_FAILURE_THRESHOLD mineru/cli/router.py72-76POST /tasks request handled by parse_request_form mineru/cli/router.py44WorkerPool, maintaining task affinity.task_id to the specific upstream worker for status and result consistency.Sources: mineru/cli/router.py44-78 mineru/cli/api_client.py28-42 mineru/cli/api_protocol.py2-4
gradio_app.py)The Gradio interface provides a visual frontend for the MinerU pipeline, supporting interactive exploration of parsing results.
ReusableLocalAPIServer to manage a dedicated mineru-api instance in the background mineru/cli/gradio_app.py54GradioRequestConcurrencyLimiter using asyncio.Semaphore and threading.Lock to manage UI-side pressure mineru/cli/gradio_app.py72-75GradioConcurrencyWaitSnapshot to show users their position in the local queue mineru/cli/gradio_app.py57-63VisualizationJob to generate side-by-side comparisons of original PDFs and detected layout blocks mineru/cli/gradio_app.py51gr.File for document selection and PDF component for preview mineru/cli/gradio_app.py20-21gr.Textbox with custom JavaScript autoscroll (STATUS_BOX_AUTOSCROLL_JS) for real-time processing logs mineru/cli/gradio_app.py202-219gradio_app.css) and JS (gradio_app.js) assets for a polished look mineru/resources/gradio_app.css1-16 mineru/resources/gradio_app.js1-10Sources: mineru/cli/gradio_app.py51-219 mineru/cli/api_client.py54 mineru/resources/gradio_app.css1-16 mineru/resources/gradio_app.js1-10
aio_do_parseBoth FastAPI and Gradio interfaces invoke aio_do_parse as the primary entry point for non-blocking execution. This function handles the routing between backends.
Pipeline Orchestration Sequence
Sources: mineru/cli/common.py35 mineru/cli/common.py24-30 mineru/cli/common.py69-74
common.pyuniquify_task_stems: Assigns task-local unique stems while preserving input order to prevent collisions mineru/cli/common.py134-168normalize_upload_filename: Sanitizes user-provided filenames to be filesystem-safe mineru/cli/common.py114-118prepare_env: Creates the necessary directory structure (/images subfolder) for a specific task mineru/cli/common.py186-191ensure_backend_dependencies: Dynamically checks for torch before attempting to load specific backend modules mineru/cli/common.py69-74read_fn: Handles initial file reading and converts image inputs into PDF bytes using images_bytes_to_pdf_bytes mineru/cli/common.py171-183Sources: mineru/cli/common.py69-191
Refresh this wiki