This page provides a high-level overview of the testing strategy and Continuous Integration/Continuous Deployment (CI/CD) pipelines for the MinerU project. The codebase employs a multi-layered testing approach, ranging from unit tests to end-to-end CLI validation, all orchestrated through GitHub Actions.
MinerU's testing infrastructure is designed to ensure the reliability of document parsing across various backends and hardware configurations. The suite includes traditional unit tests, integration tests for the CLI and SDK, and specialized benchmark scripts to evaluate model performance and extraction accuracy.
The testing strategy validates the flow from raw PDF bytes to the final middle_json and Markdown outputs. The CI environment uses uv for high-speed dependency installation of the [test] extra [.github/workflows/cli.yml28-36] and executes tests with Python 3.12 [.github/workflows/cli.yml34]. Automated coverage reporting is integrated into the workflow, utilizing clean_coverage.py to prepare the environment and get_coverage.py to aggregate results [.github/workflows/cli.yml37-39 tests/clean_coverage.py24-27].
The following diagram illustrates the relationship between the testing tools and the core codebase entities:
Testing Entity Map
Sources: .github/workflows/cli.yml37-39 tests/clean_coverage.py8-25
For a detailed breakdown of the test files (including test_table.py and test_metascan_classify.py), CLI integration tests (test_cli_sdk.py), benchmark scoring (calculate_score.py), and coverage reporting, see Test Suite.
MinerU utilizes GitHub Actions to automate testing, documentation deployment, and the release process. There are four primary workflows:
master and dev branches [.github/workflows/cli.yml5-9]. It uses uv for fast dependency management [.github/workflows/cli.yml28-29] and executes the test suite with coverage reporting via coverage run [.github/workflows/cli.yml38-39].mkdocs-deploy-gh-pages to manage the deployment lifecycle [.github/workflows/mkdocs.yml17-22].*released) [.github/workflows/python-package.yml7-9]. It handles version synchronization, cross-version installation checks (Python 3.10 to 3.13), and publishing to PyPI [.github/workflows/python-package.yml57-63 .github/workflows/python-package.yml140-144].CLAAssistant job [.github/workflows/cla.yml15-16]. It ensures contributors have signed the document at MinerU_CLA.md [.github/workflows/cla.yml29] before code is merged, tracking signatures in signatures/version1/cla.json [.github/workflows/cla.yml28].The release process is governed by update_version.py, which extracts version information from Git tags using git describe --tags via the get_version() function [update_version.py6-17] and updates the internal mineru/version.py file using write_version_to_commons() [update_version.py20-23]. This ensures that the __version__ string remains consistent across the CLI, API, and PyPI package. The build job generates the wheel file which is then uploaded as a GitHub artifact before being published [.github/workflows/python-package.yml108-117].
Release Pipeline Flow
Sources: .github/workflows/python-package.yml15-144 update_version.py6-28
For details on the release jobs, PyPI publishing via twine, and the automated versioning logic, see Release Pipeline & Versioning.
| Component | File / Tool | Purpose |
|---|---|---|
| Dependency Manager | uv | High-speed installation of .[test] and .[core] extras [.github/workflows/cli.yml28-36]. |
| Coverage | coverage.py | Measures test execution paths; managed via clean_coverage.py [.github/workflows/cli.yml38-39 tests/clean_coverage.py24-25]. |
| Versioning | update_version.py | Syncs mineru/version.py with Git tags via get_version() [update_version.py6-17]. |
| Publishing | twine | Validates and uploads wheels to PyPI using PYPI_TOKEN [.github/workflows/python-package.yml142-144]. |
| Docs | mkdocs | Builds and deploys the technical documentation to GitHub Pages via mkdocs-deploy-gh-pages [.github/workflows/mkdocs.yml17-22]. |
| Compatibility | check-install | Verifies installation across Python 3.10, 3.11, 3.12, and 3.13 [.github/workflows/python-package.yml57-63]. |
| Legal/CLA | cla.yml | Tracks contributor signatures in signatures/version1/cla.json [.github/workflows/cla.yml28]. |
Sources: .github/workflows/cli.yml28-39 .github/workflows/python-package.yml57-145 update_version.py6-28 tests/clean_coverage.py8-25 .github/workflows/cla.yml1-32