LMSupply.Ocr
0.34.20
dotnet add package LMSupply.Ocr --version 0.34.20
NuGet\Install-Package LMSupply.Ocr -Version 0.34.20
<PackageReference Include="LMSupply.Ocr" Version="0.34.20" />
<PackageVersion Include="LMSupply.Ocr" Version="0.34.20" />
<PackageReference Include="LMSupply.Ocr" />
paket add LMSupply.Ocr --version 0.34.20
#r "nuget: LMSupply.Ocr, 0.34.20"
#:package LMSupply.Ocr@0.34.20
#addin nuget:?package=LMSupply.Ocr&version=0.34.20
#tool nuget:?package=LMSupply.Ocr&version=0.34.20
LMSupply.Ocr
A simple .NET library for local OCR (Optical Character Recognition) with automatic model downloading from HuggingFace. Features a 2-stage detection + recognition pipeline using PaddleOCR ONNX models.
Features
- 2-Stage Pipeline: Text detection (DBNet) followed by text recognition (CRNN with CTC decoding)
- Multi-language Support: 40+ languages including English, Korean, Chinese, Japanese, Arabic, and more
- Automatic Model Download: Models are downloaded on-demand from HuggingFace (~10MB default)
- GPU Acceleration: Supports CUDA, DirectML, and CoreML
- Pure C# Implementation: No Python dependencies or external processes
- Polygon Support: Precise text region boundaries for rotated or curved text
Quick Start
using LMSupply.Ocr;
// Load default OCR pipeline (English)
await using var ocr = await LocalOcr.LoadAsync();
// Recognize text in an image
var result = await ocr.RecognizeAsync("document.png");
// Get all text
Console.WriteLine(result.FullText);
// Access individual text regions
foreach (var region in result.Regions)
{
Console.WriteLine($"[{region.Confidence:P0}] {region.Text}");
Console.WriteLine($" Location: {region.BoundingBox}");
}
Language-Specific OCR
// Load OCR for Korean text
await using var ocr = await LocalOcr.LoadForLanguageAsync("ko");
// Or specify the recognition model explicitly
await using var ocr = await LocalOcr.LoadAsync(
detectionModel: "default",
recognitionModel: "crnn-korean-v3");
Supported Languages
| Model | Languages |
|---|---|
crnn-en-v3 |
English |
crnn-korean-v3 |
Korean |
crnn-chinese-v3 |
Chinese (Simplified/Traditional) |
crnn-japan-v3 |
Japanese |
crnn-latin-v3 |
Spanish, French, German, Italian, Portuguese, etc. |
crnn-arabic-v3 |
Arabic |
crnn-cyrillic-v3 |
Russian, Ukrainian, Bulgarian, etc. |
crnn-devanagari-v3 |
Hindi, Marathi, Nepali, Sanskrit |
Configuration Options
var options = new OcrOptions
{
LanguageHint = "en", // Language hint for auto model selection
DetectionThreshold = 0.5f, // Minimum detection confidence
RecognitionThreshold = 0.5f, // Minimum recognition confidence
BinarizationThreshold = 0.3f, // DBNet binarization threshold
UnclipRatio = 1.5f, // Polygon expansion ratio
UsePolygon = true, // Use polygon coordinates
Provider = ExecutionProvider.Auto, // GPU acceleration
CacheDirectory = null // Custom cache directory
};
await using var ocr = await LocalOcr.LoadAsync(options: options);
Detection Only
// Get text regions without recognition
var regions = await ocr.DetectAsync("document.png");
foreach (var region in regions)
{
Console.WriteLine($"Found text at: {region.BoundingBox}");
}
Layout-Aware Text Extraction
var result = await ocr.RecognizeAsync("document.png");
// Get text with layout preserved (same-line regions joined with spaces)
var layoutText = result.GetTextWithLayout(lineTolerancePixels: 10);
Model Information
The library uses PaddleOCR ONNX models from HuggingFace:
- Repository:
monkt/paddleocr-onnx - Detection model: DBNet (~2.3MB)
- Recognition models: CRNN (~7-13MB depending on language)
Models are cached locally after first download.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- LMSupply.Vision.Core (>= 0.34.20)
- System.Numerics.Tensors (>= 10.0.8)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on LMSupply.Ocr:
| Package | Downloads |
|---|---|
|
FileFlux
Complete document processing SDK optimized for RAG systems. Transform PDF, DOCX, Excel, PowerPoint, Markdown and other formats into high-quality chunks with intelligent semantic boundary detection. Includes advanced chunking strategies, metadata extraction, and performance optimization. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.34.20 | 96 | 6/2/2026 |
| 0.34.19 | 143 | 5/25/2026 |
| 0.34.18 | 115 | 5/21/2026 |
| 0.34.17 | 241 | 5/20/2026 |
| 0.34.16 | 113 | 5/15/2026 |
| 0.34.15 | 103 | 5/14/2026 |
| 0.34.13 | 114 | 5/12/2026 |
| 0.34.12 | 114 | 5/11/2026 |
| 0.34.3 | 100 | 5/7/2026 |
| 0.34.2 | 97 | 5/5/2026 |
| 0.34.1 | 108 | 5/3/2026 |
| 0.34.0 | 102 | 5/2/2026 |
| 0.33.0 | 106 | 5/2/2026 |
| 0.32.7 | 106 | 5/2/2026 |
| 0.32.3 | 103 | 4/30/2026 |
| 0.32.0 | 130 | 4/15/2026 |
| 0.31.0 | 113 | 4/14/2026 |
| 0.30.0 | 110 | 4/14/2026 |
| 0.29.0 | 125 | 4/13/2026 |
| 0.28.0 | 108 | 4/13/2026 |