Hybrid Backend

Relevant source files

The Hybrid Backend (hybrid-auto-engine) is a dual-path document analysis system that combines the structural understanding of Vision-Language Models (VLM) with the precision of specialized OCR and formula recognition models. It is designed to overcome the limitations of pure VLM approaches (e.g., hallucination in formulas or complex tables) by using the VLM as a "layout engine" while delegating content extraction to expert models.

1. Architecture Overview

The hybrid backend operates by first utilizing a VLM to identify document structure and then applying traditional pipeline models for high-fidelity content extraction. It supports different effort levels (medium and high) to balance speed and accuracy mineru/backend/hybrid/hybrid_analyze.py110-115

Hybrid Pipeline Lifecycle

VLM Layout Analysis: The VLM identifies regions (text, title, table, image, equation). In medium effort mode, layout labels are mapped to VLM extraction types via MEDIUM_EFFORT_LAYOUT_LABEL_TO_VLM_TYPE mineru/backend/hybrid/hybrid_analyze.py83-107
MagicModel Reconstruction: The MagicModel class in the hybrid module transforms VLM outputs into a standardized block-based structure, handling coordinate normalization and category mapping mineru/backend/hybrid/hybrid_model_output_to_middle_json.py68-76
Expert Model Refinement:
- OCR Detection: Specialized models refine text detection. The ocr_det function performs batch OCR detection on cropped regions mineru/backend/hybrid/hybrid_analyze.py150-158
- Post-OCR Refinement: After block construction, _apply_post_ocr runs expert OCR models on cropped images to refine text content and scores mineru/backend/hybrid/hybrid_model_output_to_middle_json.py127-143
LLM-Aided Refinement: Optional post-processing for title hierarchy and semantic grouping using LLMs mineru/utils/title_level_postprocess.py32-43

System Entity Bridge

The following diagram illustrates how high-level logical components map to specific code entities within the hybrid backend.

Hybrid Backend Entity Mapping

Sources: mineru/backend/hybrid/hybrid_analyze.py15-27 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py68-76 mineru/utils/title_level_postprocess.py32-43 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py192-201

2. MagicModel Structure Reconstruction

The MagicModel (Hybrid version) is responsible for mapping VLM-detected categories to internal BlockType and ContentType enums. It manages the integration of PDF-extracted spans, inline formulas, and OCR results.

Category Type	Mapping Method	Output Block Collections
Visual	`get_image_blocks()`, `get_table_blocks()`	`image_blocks`, `table_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py77-78
Structural	`get_title_blocks()`, `get_list_blocks()`	`title_blocks`, `list_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py80-85
Content	`get_text_blocks()`, `get_code_blocks()`	`text_blocks`, `code_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py82-91

Content Refinement Logic

The reconstruction process involves several refinement steps:

Title Height Analysis: _resolve_title_line_avg_height calculates average line height for titles, prioritizing _ocr_det_lines (Hybrid OCR detection hints) over standard line bboxes mineru/backend/hybrid/hybrid_model_output_to_middle_json.py32-44
Image/Table Cropping: For spans marked as IMAGE, TABLE, or CHART, the system invokes cut_image_and_table to generate physical image assets mineru/backend/hybrid/hybrid_model_output_to_middle_json.py96-98
Post-OCR Fallback: If OCR confidence is low, the system attempts to restore content via _restore_post_ocr_fallback mineru/backend/hybrid/hybrid_model_output_to_middle_json.py154-158

3. Hybrid Analysis Pipeline

The hybrid pipeline coordinates data flow between the VLM and specialized models. It uses pypdfium2 for page rendering and coordinate calculations mineru/backend/hybrid/hybrid_analyze.py9-11

Data Flow: hybrid_analyze

Sources: mineru/backend/hybrid/hybrid_analyze.py11-12 mineru/backend/hybrid/hybrid_analyze.py150-158 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py192-201

Expert Model Integration

The hybrid backend can switch between VLM-native OCR and specialized pipeline OCR based on the ocr_classify result mineru/backend/hybrid/hybrid_analyze.py140-148

OCR Detection: ocr_det performs batch processing of cropped images. It uses crop_img with padding to extract regions for the OCR engine mineru/backend/hybrid/hybrid_analyze.py191-193
Formula Masking: Mathematical Formula Detection (MFD) results are used to mask formula regions during OCR detection via mask_formula_regions_for_ocr_det, preventing the OCR engine from corrupting LaTeX formulas mineru/backend/hybrid/hybrid_analyze.py201-204
Batch Processing: The pipeline supports batching for MFR (Mathematical Formula Recognition) and OCR to optimize GPU utilization mineru/backend/hybrid/hybrid_analyze.py69-70

The system can optionally use an LLM to refine document hierarchy when title_aided is enabled in configuration mineru/utils/title_level_postprocess.py32-38

Configuration Resolution: _resolve_title_aided_config checks the mineru.json for title_aided settings mineru/utils/title_level_postprocess.py17-29
Refinement Execution: apply_title_leveling_to_pdf_info triggers the llm_aided_title function, which uses semantic context to correct VLM-predicted title levels mineru/utils/title_level_postprocess.py32-43

5. Middle JSON Finalization

In the hybrid flow, finalize_middle_json_from_preproc is called to group spans into paragraphs and apply layout-level corrections.

Paragraph and Text Merging

Paragraph Building: build_para_blocks_from_preproc initializes the paragraph-level structure by copying layout blocks mineru/backend/utils/para_block_utils.py42-44
Text Merging: merge_para_text_blocks merges adjacent text blocks across pages. It uses LINE_STOP_FLAG (e.g., ., !, ?, 。) to determine if a block ends a sentence mineru/backend/utils/para_block_utils.py8-14 mineru/backend/utils/para_block_utils.py47-51
Merge Barriers: Types like TITLE and INTERLINE_EQUATION act as SECTION_MERGE_BARRIER_TYPES, preventing incorrect semantic merging mineru/backend/utils/para_block_utils.py9-14

Standardization

Title Normalization: _normalize_split_title_blocks ensures that Hybrid-specific title types (DOC_TITLE, PARAGRAPH_TITLE) are standardized to BlockType.TITLE for the final output schema mineru/backend/hybrid/hybrid_model_output_to_middle_json.py165-179
Metadata Cleanup: Internal processing keys like _ocr_det_lines and line_avg_height are removed during the final stage via cleanup_internal_para_block_metadata mineru/backend/utils/para_block_utils.py27-30

Sources: mineru/backend/hybrid/hybrid_analyze.py mineru/backend/hybrid/hybrid_model_output_to_middle_json.py mineru/backend/utils/para_block_utils.py mineru/utils/title_level_postprocess.py

Hybrid Backend

Relevant source files

1. Architecture Overview

Hybrid Pipeline Lifecycle

VLM Layout Analysis: The VLM identifies regions (text, title, table, image, equation). In medium effort mode, layout labels are mapped to VLM extraction types via MEDIUM_EFFORT_LAYOUT_LABEL_TO_VLM_TYPE mineru/backend/hybrid/hybrid_analyze.py83-107
MagicModel Reconstruction: The MagicModel class in the hybrid module transforms VLM outputs into a standardized block-based structure, handling coordinate normalization and category mapping mineru/backend/hybrid/hybrid_model_output_to_middle_json.py68-76
Expert Model Refinement:
- OCR Detection: Specialized models refine text detection. The ocr_det function performs batch OCR detection on cropped regions mineru/backend/hybrid/hybrid_analyze.py150-158
- Post-OCR Refinement: After block construction, _apply_post_ocr runs expert OCR models on cropped images to refine text content and scores mineru/backend/hybrid/hybrid_model_output_to_middle_json.py127-143
LLM-Aided Refinement: Optional post-processing for title hierarchy and semantic grouping using LLMs mineru/utils/title_level_postprocess.py32-43

System Entity Bridge

The following diagram illustrates how high-level logical components map to specific code entities within the hybrid backend.

Hybrid Backend Entity Mapping

2. MagicModel Structure Reconstruction

Category Type	Mapping Method	Output Block Collections
Visual	`get_image_blocks()`, `get_table_blocks()`	`image_blocks`, `table_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py77-78
Structural	`get_title_blocks()`, `get_list_blocks()`	`title_blocks`, `list_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py80-85
Content	`get_text_blocks()`, `get_code_blocks()`	`text_blocks`, `code_blocks` mineru/backend/hybrid/hybrid_model_output_to_middle_json.py82-91

Content Refinement Logic

The reconstruction process involves several refinement steps:

Title Height Analysis: _resolve_title_line_avg_height calculates average line height for titles, prioritizing _ocr_det_lines (Hybrid OCR detection hints) over standard line bboxes mineru/backend/hybrid/hybrid_model_output_to_middle_json.py32-44
Image/Table Cropping: For spans marked as IMAGE, TABLE, or CHART, the system invokes cut_image_and_table to generate physical image assets mineru/backend/hybrid/hybrid_model_output_to_middle_json.py96-98
Post-OCR Fallback: If OCR confidence is low, the system attempts to restore content via _restore_post_ocr_fallback mineru/backend/hybrid/hybrid_model_output_to_middle_json.py154-158

3. Hybrid Analysis Pipeline

The hybrid pipeline coordinates data flow between the VLM and specialized models. It uses pypdfium2 for page rendering and coordinate calculations mineru/backend/hybrid/hybrid_analyze.py9-11

Data Flow: hybrid_analyze

Sources: mineru/backend/hybrid/hybrid_analyze.py11-12 mineru/backend/hybrid/hybrid_analyze.py150-158 mineru/backend/hybrid/hybrid_model_output_to_middle_json.py192-201

Expert Model Integration

The hybrid backend can switch between VLM-native OCR and specialized pipeline OCR based on the ocr_classify result mineru/backend/hybrid/hybrid_analyze.py140-148

OCR Detection: ocr_det performs batch processing of cropped images. It uses crop_img with padding to extract regions for the OCR engine mineru/backend/hybrid/hybrid_analyze.py191-193
Formula Masking: Mathematical Formula Detection (MFD) results are used to mask formula regions during OCR detection via mask_formula_regions_for_ocr_det, preventing the OCR engine from corrupting LaTeX formulas mineru/backend/hybrid/hybrid_analyze.py201-204
Batch Processing: The pipeline supports batching for MFR (Mathematical Formula Recognition) and OCR to optimize GPU utilization mineru/backend/hybrid/hybrid_analyze.py69-70

The system can optionally use an LLM to refine document hierarchy when title_aided is enabled in configuration mineru/utils/title_level_postprocess.py32-38

Configuration Resolution: _resolve_title_aided_config checks the mineru.json for title_aided settings mineru/utils/title_level_postprocess.py17-29
Refinement Execution: apply_title_leveling_to_pdf_info triggers the llm_aided_title function, which uses semantic context to correct VLM-predicted title levels mineru/utils/title_level_postprocess.py32-43

5. Middle JSON Finalization

In the hybrid flow, finalize_middle_json_from_preproc is called to group spans into paragraphs and apply layout-level corrections.

Paragraph and Text Merging

Paragraph Building: build_para_blocks_from_preproc initializes the paragraph-level structure by copying layout blocks mineru/backend/utils/para_block_utils.py42-44
Text Merging: merge_para_text_blocks merges adjacent text blocks across pages. It uses LINE_STOP_FLAG (e.g., ., !, ?, 。) to determine if a block ends a sentence mineru/backend/utils/para_block_utils.py8-14 mineru/backend/utils/para_block_utils.py47-51
Merge Barriers: Types like TITLE and INTERLINE_EQUATION act as SECTION_MERGE_BARRIER_TYPES, preventing incorrect semantic merging mineru/backend/utils/para_block_utils.py9-14

Standardization

Title Normalization: _normalize_split_title_blocks ensures that Hybrid-specific title types (DOC_TITLE, PARAGRAPH_TITLE) are standardized to BlockType.TITLE for the final output schema mineru/backend/hybrid/hybrid_model_output_to_middle_json.py165-179
Metadata Cleanup: Internal processing keys like _ocr_det_lines and line_avg_height are removed during the final stage via cleanup_internal_para_block_metadata mineru/backend/utils/para_block_utils.py27-30

Sources: mineru/backend/hybrid/hybrid_analyze.py mineru/backend/hybrid/hybrid_model_output_to_middle_json.py mineru/backend/utils/para_block_utils.py mineru/utils/title_level_postprocess.py

Hybrid Backend

1. Architecture Overview

Hybrid Pipeline Lifecycle

System Entity Bridge

2. MagicModel Structure Reconstruction

Content Refinement Logic

3. Hybrid Analysis Pipeline

Expert Model Integration

4. LLM-Aided Title Refinement

5. Middle JSON Finalization

Paragraph and Text Merging

Standardization

On this page

Hybrid Backend

1. Architecture Overview

Hybrid Pipeline Lifecycle

System Entity Bridge

2. MagicModel Structure Reconstruction

Content Refinement Logic

3. Hybrid Analysis Pipeline

Expert Model Integration

4. LLM-Aided Title Refinement

5. Middle JSON Finalization

Paragraph and Text Merging

Standardization

On this page