Skip to content

Conversation

@kmruiz
Copy link
Collaborator

@kmruiz kmruiz commented Oct 24, 2025

Proposed changes

Adds support pre-filters in the aggregate tool, when using the $vectorSearch stage. We are also adding more matchers for better accuracy tests, namely:

  • Matcher.caseInsensitiveString: for checking if a string is equal to another ignoring the case. Some LLMs can change the casing of values, and we don't have control over that unless the user specifically prompts for it, so for our tests, we will assume they are correct.
  • Matcher.not: negates a matcher. For example, to ensure an array does not contain a specific value.
  • Matcher.arrayOrSingle: Matches either [ value ] or value. This is important because MQL queries sometimes support both values.

Checklist

@kmruiz kmruiz requested a review from a team as a code owner October 24, 2025 13:03
@Copilot Copilot AI review requested due to automatic review settings October 24, 2025 13:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for pre-filters in the $vectorSearch stage of the aggregate tool, along with new matcher utilities to improve accuracy test flexibility.

Key Changes:

  • Enhanced $vectorSearch filter field description to distinguish between pre-filtering (using indexed filter fields) and post-filtering (using $match stages)
  • Added three new matcher utilities: caseInsensitiveString for case-insensitive string comparisons, not for negating matchers, and arrayOrSingle for matching either array or single values
  • Expanded accuracy tests to validate pre-filter and post-filter scenarios in vector search queries

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/accuracy/sdk/matcher.ts Added new matcher classes (CaseInsensitiveStringMatcher, NotMatcher, ArrayOrSingleValueMatching) and corresponding factory methods
tests/accuracy/aggregate.test.ts Added new test cases for pre-filter and post-filter scenarios, refactored common embedding parameters, and fixed typo "sci-fy" → "sci-fi"
src/tools/mongodb/read/aggregate.ts Updated pipeline description with detailed guidance on pre-filtering vs post-filtering in $vectorSearch stages

@github-actions

This comment has been minimized.

@coveralls
Copy link
Collaborator

coveralls commented Oct 24, 2025

Pull Request Test Coverage Report for Build 18784361545

Details

  • 18 of 23 (78.26%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.03%) to 80.105%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/common/search/embeddingsProvider.ts 14 19 73.68%
Totals Coverage Status
Change from base Build 18777096610: -0.03%
Covered Lines: 6307
Relevant Lines: 7732

💛 - Coveralls

Voyage will reject all requests with extra parameters
@github-actions
Copy link
Contributor

📊 Accuracy Test Results

📈 Summary

Metric Value
Commit SHA ff4d03cfa0293d05df11a5459bc65b3846f15d65
Run ID 62b757e2-8449-41c4-97a2-595be939d617
Status done
Total Prompts Evaluated 100
Models Tested 1
Average Accuracy 92.2%
Responses with 0% Accuracy 7
Responses with 75% Accuracy 4
Responses with 100% Accuracy 91

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 10/24/2025, 4:13:07 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants