This page provides a high-level overview of the deployment strategies and infrastructure support for MinerU. It covers containerization, hardware acceleration across diverse platforms, and scaling to multi-GPU or enterprise environments.
MinerU is designed to be portable across different environments, ranging from local CPU-only machines to high-performance GPU clusters. The system leverages vLLM and lmdeploy for inference acceleration and provides several entry points for different use cases docs/zh/quick_start/docker_deployment.md18-25
The following diagram illustrates how different deployment modes (CLI, API, Web) interact with the underlying hardware and inference engines.
Infrastructure Dispatch Diagram
Sources: docs/zh/quick_start/docker_deployment.md18-25 docker/compose.yaml60-93 docs/en/quick_start/docker_deployment.md58-67
MinerU provides specialized Docker environments to simplify dependency management, especially for complex inference frameworks like vLLM docs/en/quick_start/docker_deployment.md3-14
DaoCloud mirrors and ModelScope for model downloads docker/china/Dockerfile5-24 The global version defaults to HuggingFace docker/global/Dockerfile5-24vllm/vllm-openai:v0.21.0 (or v0.21.0-cu129 for CUDA 12.9) as the base image to provide support for vLLM acceleration on compatible NVIDIA hardware docker/global/Dockerfile5-6 docker/china/Dockerfile5-6compose.yaml file supports profiles for openai-server, api, router, and gradio docker/compose.yaml1-123 It includes health checks and resource reservations for NVIDIA GPUs using the nvidia driver and gpu capabilities docker/compose.yaml20-28ENTRYPOINT in official Dockerfiles sets MINERU_MODEL_SOURCE=local to ensure the container uses pre-downloaded models during execution docker/global/Dockerfile27 docker/china/Dockerfile27For details on building and running containers, see Docker Deployment.
Sources: docker/global/Dockerfile1-27 docker/china/Dockerfile1-27 docs/en/quick_start/docker_deployment.md5-50 docker/compose.yaml1-123
MinerU supports a wide array of hardware accelerators. The system detects available hardware at runtime to optimize performance, often allowing a choice between vLLM and lmdeploy backends.
vLLM acceleration docs/en/quick_start/docker_deployment.md18-25 Users can tune VRAM usage via the --gpu-memory-utilization flag in compose.yaml docker/compose.yaml12-15docker run docs/en/quick_start/docker_deployment.md30-36mineru client can connect to remote OpenAI-compatible servers using the vlm-http-client backend docs/en/quick_start/docker_deployment.md58-67For details on device detection and specific hardware configurations, see Hardware Acceleration.
Sources: docs/en/quick_start/docker_deployment.md5-25 docs/en/quick_start/docker_deployment.md58-67 docker/compose.yaml12-15
For high-throughput requirements, MinerU can be scaled across multiple GPUs or integrated into enterprise task queues.
mineru-router service can aggregate multiple mineru-api instances across different GPUs using the --upstream-url flag or manage local workers with --local-gpus auto docker/compose.yaml60-83--gpu-memory-utilization allow tuning the KV cache size (e.g., setting to 0.5 or lower) to prevent Out-Of-Memory (OOM) errors in vLLM environments docker/compose.yaml15device_ids in the compose.yaml resource reservations docker/compose.yaml27 docker/compose.yaml57Multi-Accelerator Deployment Logic
Sources: docker/compose.yaml60-93 docker/compose.yaml15 docker/compose.yaml10
For details on scaling and enterprise integration, see Multi-GPU & Enterprise Deployments.