This page provides a detailed technical reference for MinerU's configuration system, including the mineru.json file, environment variables, and the config_reader module that resolves these settings at runtime.
MinerU uses a multi-layered configuration approach. Settings are first defined in a template, then overridden by a user-provided JSON file, and finally can be superseded by environment variables for runtime flexibility.
The following diagram illustrates how the config_reader module resolves settings from different sources.
Configuration Resolution Process
Sources: mineru/utils/config_reader.py14-30 mineru/utils/config_reader.py33-60 mineru/utils/config_reader.py63-78 mineru/utils/config_reader.py105-137 mineru/utils/config_reader.py140-155 mineru/utils/config_reader.py158-169 mineru/utils/os_env_config.py15-17
The primary configuration file is mineru.json. By default, MinerU looks for this file in the user's home directory (~/mineru.json), but this can be overridden by setting the MINERU_TOOLS_CONFIG_JSON environment variable mineru/utils/config_reader.py14-23
| Section | Description |
|---|---|
models-dir | Local directory path where model weights are stored. Supports separate paths for pipeline and vlm backends mineru/utils/config_reader.py219-226 mineru.template.json25-28 |
model-source | Remote repository source. Supports huggingface, modelscope, or auto mineru/utils/config_reader.py33-60 mineru.template.json29 |
bucket_info | Dictionary containing S3/MinIO credentials (AK, SK, Endpoint) indexed by bucket name mineru/utils/config_reader.py63-78 |
latex-delimiter-config | Defines delimiters for inline and display formulas (e.g., $ vs $$) mineru/utils/config_reader.py195-204 mineru.template.json6-15 |
llm-aided-config | Configuration for LLM-assisted structural refinement, specifically title_aided mineru/utils/config_reader.py207-216 mineru.template.json16-24 |
config_version | Tracks the schema version of the configuration file mineru.template.json30 |
The llm-aided-config block enables the llm_aided module to refine document structure using OpenAI-compatible APIs mineru/utils/llm_aided.py160-167
| Key | Description |
|---|---|
api_key | API key for the LLM provider mineru.template.json18 |
base_url | Base URL for the OpenAI-compatible endpoint mineru.template.json19 |
model | The specific model ID to use (e.g., qwen3.5-plus) mineru.template.json20 |
enable_thinking | Enables reasoning/thinking blocks. If enabled, the logic strips </think> tags mineru/utils/llm_aided.py184-200 |
enable | Boolean toggle to activate LLM refinement at runtime mineru.template.json22 |
Sources: mineru/utils/config_reader.py207-216 mineru/utils/llm_aided.py160-200 mineru.template.json1-32
Environment variables provide the highest level of override and are frequently used in Docker and CI/CD environments.
MINERU_DEVICE_MODE: Manually force the device type (e.g., cpu, cuda, mps, npu, musa, mlu). If not set, get_device() performs auto-detection mineru/utils/config_reader.py105-137MINERU_TOOLS_CONFIG_JSON: Specifies an absolute or relative path to the configuration JSON file mineru/utils/config_reader.py14-23MINERU_MODEL_SOURCE: Configures model source. Supported values: huggingface, modelscope, local. Overrides mineru.json docs/en/usage/model_source.md11-12MINERU_FORMULA_ENABLE: Global toggle for formula detection and recognition mineru/utils/config_reader.py140-143MINERU_TABLE_ENABLE: Global toggle for table recognition mineru/utils/config_reader.py146-149MINERU_OCR_DET_MASK_INLINE_FORMULA_ENABLE: If enabled, masks detected inline formulas during the OCR detection phase mineru/utils/config_reader.py152-155MINERU_PROCESSING_WINDOW_SIZE: Sets the window size for document processing chunks. Defaults to 64 mineru/utils/config_reader.py158-169MINERU_API_MAX_CONCURRENT_REQUESTS: Limits concurrent requests in API mode. Defaults to 3 mineru/utils/config_reader.py172-192MINERU_PDF_RENDER_THREADS: Number of threads used for PDF rendering. Defaults to 3 mineru/utils/os_env_config.py15-17MINERU_PDF_RENDER_TIMEOUT: Timeout in seconds for PDF rendering operations. Defaults to 300 mineru/utils/os_env_config.py10-12Sources: mineru/utils/config_reader.py105-192 mineru/utils/os_env_config.py10-28 docs/en/usage/model_source.md11-26
The config_reader module provides utility functions to resolve settings dynamically.
The get_device() function in config_reader.py follows a specific priority order:
MINERU_DEVICE_MODE environment variable.torch.cuda.is_available() -> cuda.torch.backends.mps.is_available() -> mps.npu (Ascend), gcu (Enflame), musa (Moore Threads), mlu (Cambricon), sdaa (Tecorigin).cpu.Hardware Detection Sequence
Sources: mineru/utils/config_reader.py105-137
MinerU handles model sources with logic in models_download_utils.py and config_reader.py. When set to auto, it probes HuggingFace accessibility mineru/utils/models_download_utils.py190-200 Once resolved, it persists the actual source to mineru.json mineru/utils/models_download_utils.py88-99
Model Source Resolution Logic
Sources: mineru/utils/models_download_utils.py88-101 mineru/utils/models_download_utils.py189-204 docs/en/usage/model_source.md25-31
When models are downloaded via the mineru-models-download CLI command, the system automatically updates the configuration file. The persist_downloaded_model_config function writes the local path for the downloaded repository (either pipeline or vlm) and the source used into mineru.json mineru/utils/models_download_utils.py168-187
| Function | Role |
|---|---|
read_config() | Loads and parses mineru.json mineru/utils/config_reader.py17-30 |
get_configured_model_source() | Reads the model source from config mineru/utils/config_reader.py33-60 |
get_s3_config(bucket_name) | Retrieves S3 credentials mineru/utils/config_reader.py63-78 |
get_device() | Detects hardware acceleration mineru/utils/config_reader.py105-137 |
get_latex_delimiter_config() | Returns formula delimiter settings mineru/utils/config_reader.py195-204 |
get_llm_aided_config() | Retrieves structural refinement config mineru/utils/config_reader.py207-216 |
get_local_models_dir() | Returns local model directory mineru/utils/config_reader.py219-226 |
| Function | Role |
|---|---|
get_load_images_threads() | Returns PDF rendering thread count mineru/utils/os_env_config.py15-17 |
get_load_images_timeout() | Returns PDF rendering timeout mineru/utils/os_env_config.py10-12 |
Sources: mineru/utils/config_reader.py17-226 mineru/utils/os_env_config.py5-28 mineru/utils/models_download_utils.py168-187
Refresh this wiki