Skip to content

Conversation

@enoch3712
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings April 1, 2025 23:37
@enoch3712 enoch3712 linked an issue Apr 1, 2025 that may be closed by this pull request
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the Mistral OCR Document Loader along with associated tests and documentation.

  • Introduces the DocumentLoaderMistralOCR and MistralOCRConfig classes.
  • Updates module exports and documentation to include the new loader.
  • Adds new tests to validate configuration, URL, file, and BytesIO processing for Mistral OCR.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_extractor.py Adds a main block for manual testing; however, contains undefined variable issues and a misnamed loader instance.
tests/test_document_loader_mistral_ocr.py New test cases for validating the Mistral OCR loader functionality.
extract_thinker/document_loader/document_loader_mistral_ocr.py Implements the new Mistral OCR loader with payload preparation, file upload, and pagination support.
extract_thinker/init.py Updates module exports to include the new Mistral OCR loader and its configuration.
docs/core-concepts/document-loaders/mistral-ocr.md Provides comprehensive documentation and usage examples for the Mistral OCR loader.


if __name__ == "__main__":

pdf_path = os.path.join(cwd, "tests", "files", "invoice.pdf")
Copy link

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'cwd' is not defined; consider using os.getcwd() to obtain the current working directory.

Copilot uses AI. Check for mistakes.
pdf_path = os.path.join(cwd, "tests", "files", "invoice.pdf")

extractor = Extractor()
extractor.load_document_loader(DocumentLoaderDocling())
Copy link

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class 'DocumentLoaderDocling' appears to be a typo or misreference; update it to 'DocumentLoaderAWSTextract' as imported.

Suggested change
extractor.load_document_loader(DocumentLoaderDocling())
extractor.load_document_loader(DocumentLoaderAWSTextract())

Copilot uses AI. Check for mistakes.
Copy link

@cody-sugarman cody-sugarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@enoch3712 enoch3712 merged commit 78eebf9 into main Apr 2, 2025
6 checks passed
@enoch3712 enoch3712 deleted the 292-mistral-dl branch April 2, 2025 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mistral DL

3 participants