Deployment & Infrastructure

Relevant source files

This page provides a high-level overview of the deployment strategies and infrastructure support for MinerU. It covers containerization, hardware acceleration across diverse platforms, and scaling to multi-GPU or enterprise environments.

Deployment Overview

MinerU is designed to be portable across different environments, ranging from local CPU-only machines to high-performance GPU clusters. The system leverages vLLM and lmdeploy for inference acceleration and provides several entry points for different use cases docs/zh/quick_start/docker_deployment.md18-25

System Entry Points and Infrastructure

The following diagram illustrates how different deployment modes (CLI, API, Web) interact with the underlying hardware and inference engines.

Infrastructure Dispatch Diagram

Sources: docs/zh/quick_start/docker_deployment.md18-25 docker/compose.yaml60-93 docs/en/quick_start/docker_deployment.md58-67

Docker Deployment

MinerU provides specialized Docker environments to simplify dependency management, especially for complex inference frameworks like vLLM docs/en/quick_start/docker_deployment.md3-14

Regional Images: Separate Dockerfiles exist for global and China-region users. The China-region version uses DaoCloud mirrors and ModelScope for model downloads docker/china/Dockerfile5-24 The global version defaults to HuggingFace docker/global/Dockerfile5-24
Base Images: Regional Dockerfiles utilize vllm/vllm-openai:v0.21.0 (or v0.21.0-cu129 for CUDA 12.9) as the base image to provide support for vLLM acceleration on compatible NVIDIA hardware docker/global/Dockerfile5-6 docker/china/Dockerfile5-6
Orchestration: A compose.yaml file supports profiles for openai-server, api, router, and gradio docker/compose.yaml1-123 It includes health checks and resource reservations for NVIDIA GPUs using the nvidia driver and gpu capabilities docker/compose.yaml20-28
Container Lifecycle: The ENTRYPOINT in official Dockerfiles sets MINERU_MODEL_SOURCE=local to ensure the container uses pre-downloaded models during execution docker/global/Dockerfile27 docker/china/Dockerfile27

For details on building and running containers, see Docker Deployment.

Sources: docker/global/Dockerfile1-27 docker/china/Dockerfile1-27 docs/en/quick_start/docker_deployment.md5-50 docker/compose.yaml1-123

Hardware Acceleration

MinerU supports a wide array of hardware accelerators. The system detects available hardware at runtime to optimize performance, often allowing a choice between vLLM and lmdeploy backends.

NVIDIA GPUs: Supported via CUDA. Requires Volta architecture or later with 8GB+ VRAM for vLLM acceleration docs/en/quick_start/docker_deployment.md18-25 Users can tune VRAM usage via the --gpu-memory-utilization flag in compose.yaml docker/compose.yaml12-15
Apple Silicon: Native support for MPS and MLX acceleration is available for macOS, though Docker deployment on macOS is discouraged as it cannot access these hardware features docs/en/quick_start/docker_deployment.md5-7
Domestic Accelerators: Extensive support for Chinese domestic hardware including METAX (MACA), T-Head (PPU), and others. These typically require specific base images and device mapping during docker run docs/en/quick_start/docker_deployment.md30-36
Lightweight Client Mode: For devices without high-performance GPUs, a lightweight mineru client can connect to remote OpenAI-compatible servers using the vlm-http-client backend docs/en/quick_start/docker_deployment.md58-67

For details on device detection and specific hardware configurations, see Hardware Acceleration.

Sources: docs/en/quick_start/docker_deployment.md5-25 docs/en/quick_start/docker_deployment.md58-67 docker/compose.yaml12-15

Multi-GPU & Enterprise Deployments

For high-throughput requirements, MinerU can be scaled across multiple GPUs or integrated into enterprise task queues.

Service Routing: The mineru-router service can aggregate multiple mineru-api instances across different GPUs using the --upstream-url flag or manage local workers with --local-gpus auto docker/compose.yaml60-83
VRAM Management: Parameters like --gpu-memory-utilization allow tuning the KV cache size (e.g., setting to 0.5 or lower) to prevent Out-Of-Memory (OOM) errors in vLLM environments docker/compose.yaml15
Device Isolation: Deployments can target specific GPUs by modifying device_ids in the compose.yaml resource reservations docker/compose.yaml27 docker/compose.yaml57

Multi-Accelerator Deployment Logic

Sources: docker/compose.yaml60-93 docker/compose.yaml15 docker/compose.yaml10

For details on scaling and enterprise integration, see Multi-GPU & Enterprise Deployments.

Deployment & Infrastructure

Relevant source files

Deployment Overview

System Entry Points and Infrastructure

The following diagram illustrates how different deployment modes (CLI, API, Web) interact with the underlying hardware and inference engines.

Infrastructure Dispatch Diagram

Sources: docs/zh/quick_start/docker_deployment.md18-25 docker/compose.yaml60-93 docs/en/quick_start/docker_deployment.md58-67

Docker Deployment

MinerU provides specialized Docker environments to simplify dependency management, especially for complex inference frameworks like vLLM docs/en/quick_start/docker_deployment.md3-14

Regional Images: Separate Dockerfiles exist for global and China-region users. The China-region version uses DaoCloud mirrors and ModelScope for model downloads docker/china/Dockerfile5-24 The global version defaults to HuggingFace docker/global/Dockerfile5-24
Base Images: Regional Dockerfiles utilize vllm/vllm-openai:v0.21.0 (or v0.21.0-cu129 for CUDA 12.9) as the base image to provide support for vLLM acceleration on compatible NVIDIA hardware docker/global/Dockerfile5-6 docker/china/Dockerfile5-6
Orchestration: A compose.yaml file supports profiles for openai-server, api, router, and gradio docker/compose.yaml1-123 It includes health checks and resource reservations for NVIDIA GPUs using the nvidia driver and gpu capabilities docker/compose.yaml20-28
Container Lifecycle: The ENTRYPOINT in official Dockerfiles sets MINERU_MODEL_SOURCE=local to ensure the container uses pre-downloaded models during execution docker/global/Dockerfile27 docker/china/Dockerfile27

For details on building and running containers, see Docker Deployment.

Sources: docker/global/Dockerfile1-27 docker/china/Dockerfile1-27 docs/en/quick_start/docker_deployment.md5-50 docker/compose.yaml1-123

Hardware Acceleration

MinerU supports a wide array of hardware accelerators. The system detects available hardware at runtime to optimize performance, often allowing a choice between vLLM and lmdeploy backends.

NVIDIA GPUs: Supported via CUDA. Requires Volta architecture or later with 8GB+ VRAM for vLLM acceleration docs/en/quick_start/docker_deployment.md18-25 Users can tune VRAM usage via the --gpu-memory-utilization flag in compose.yaml docker/compose.yaml12-15
Apple Silicon: Native support for MPS and MLX acceleration is available for macOS, though Docker deployment on macOS is discouraged as it cannot access these hardware features docs/en/quick_start/docker_deployment.md5-7
Domestic Accelerators: Extensive support for Chinese domestic hardware including METAX (MACA), T-Head (PPU), and others. These typically require specific base images and device mapping during docker run docs/en/quick_start/docker_deployment.md30-36
Lightweight Client Mode: For devices without high-performance GPUs, a lightweight mineru client can connect to remote OpenAI-compatible servers using the vlm-http-client backend docs/en/quick_start/docker_deployment.md58-67

For details on device detection and specific hardware configurations, see Hardware Acceleration.

Sources: docs/en/quick_start/docker_deployment.md5-25 docs/en/quick_start/docker_deployment.md58-67 docker/compose.yaml12-15

Multi-GPU & Enterprise Deployments

For high-throughput requirements, MinerU can be scaled across multiple GPUs or integrated into enterprise task queues.

Service Routing: The mineru-router service can aggregate multiple mineru-api instances across different GPUs using the --upstream-url flag or manage local workers with --local-gpus auto docker/compose.yaml60-83
VRAM Management: Parameters like --gpu-memory-utilization allow tuning the KV cache size (e.g., setting to 0.5 or lower) to prevent Out-Of-Memory (OOM) errors in vLLM environments docker/compose.yaml15
Device Isolation: Deployments can target specific GPUs by modifying device_ids in the compose.yaml resource reservations docker/compose.yaml27 docker/compose.yaml57

Multi-Accelerator Deployment Logic

Sources: docker/compose.yaml60-93 docker/compose.yaml15 docker/compose.yaml10

For details on scaling and enterprise integration, see Multi-GPU & Enterprise Deployments.

Deployment & Infrastructure

Deployment Overview

System Entry Points and Infrastructure

Docker Deployment

Hardware Acceleration

Multi-GPU & Enterprise Deployments

On this page

Deployment & Infrastructure

Deployment Overview

System Entry Points and Infrastructure

Docker Deployment

Hardware Acceleration

Multi-GPU & Enterprise Deployments

On this page