-
Notifications
You must be signed in to change notification settings - Fork 143
Mistral DL with doucmentation #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the Mistral OCR Document Loader along with associated tests and documentation.
- Introduces the DocumentLoaderMistralOCR and MistralOCRConfig classes.
- Updates module exports and documentation to include the new loader.
- Adds new tests to validate configuration, URL, file, and BytesIO processing for Mistral OCR.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_extractor.py | Adds a main block for manual testing; however, contains undefined variable issues and a misnamed loader instance. |
| tests/test_document_loader_mistral_ocr.py | New test cases for validating the Mistral OCR loader functionality. |
| extract_thinker/document_loader/document_loader_mistral_ocr.py | Implements the new Mistral OCR loader with payload preparation, file upload, and pagination support. |
| extract_thinker/init.py | Updates module exports to include the new Mistral OCR loader and its configuration. |
| docs/core-concepts/document-loaders/mistral-ocr.md | Provides comprehensive documentation and usage examples for the Mistral OCR loader. |
|
|
||
| if __name__ == "__main__": | ||
|
|
||
| pdf_path = os.path.join(cwd, "tests", "files", "invoice.pdf") |
Copilot
AI
Apr 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'cwd' is not defined; consider using os.getcwd() to obtain the current working directory.
| pdf_path = os.path.join(cwd, "tests", "files", "invoice.pdf") | ||
|
|
||
| extractor = Extractor() | ||
| extractor.load_document_loader(DocumentLoaderDocling()) |
Copilot
AI
Apr 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class 'DocumentLoaderDocling' appears to be a typo or misreference; update it to 'DocumentLoaderAWSTextract' as imported.
| extractor.load_document_loader(DocumentLoaderDocling()) | |
| extractor.load_document_loader(DocumentLoaderAWSTextract()) |
cody-sugarman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
No description provided.