Add Support for Update by Metadata Filter #533
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Support for Update by Metadata Filter
Summary
This PR adds support for updating vectors by metadata filter across all API implementations (REST, Asyncio, and gRPC). Previously, the
updatemethod only supported updating vectors by ID. Now users can update multiple vectors at once by specifying a metadata filter, similar to howdeleteandfetch_by_metadatawork.Background
The core OpenAPI layer (
pinecone/core/openapi/db_data/model/update_request.py) already supported thefilteranddry_runparameters inUpdateRequest, but these were not exposed in the user-facing API. This PR makes these features accessible to users.Changes
API Updates
filterparameter: A metadata filter expression that selects which vectors to updatedry_runparameter: WhenTrue, returns the count of matching vectors without executing the updateidparameter optional: Whenfilteris provided,idmust beNonematched_recordscount when usingfilter(even withoutdry_run)Implementation Details
Interface Definitions (
pinecone/db_data/interfaces.py,pinecone/db_data/index_asyncio_interface.py)update()methods to includefilteranddry_runparametersREST Implementation (
pinecone/db_data/index.py)idandfilterare mutually exclusiveidorfilteris providedmatched_recordswhen presentAsyncio Implementation (
pinecone/db_data/index_asyncio.py)gRPC Implementation (
pinecone/grpc/index_grpc.py)google.protobuf.Structusing existingdict_to_proto_struct()utilitymatched_recordsparse_update_response()inpinecone/grpc/utils.pyto extractmatched_recordsfrom responsesRequest Factory (
pinecone/db_data/request_factory.py)update_request()to explicitly handlefilteranddry_runparametersidin request when it's notNone(to support filter-only updates)Testing
Added comprehensive unit tests covering:
dry_run=Trueidandfilterprovided (should raiseValueError)idnorfilterprovided (should raiseValueError)matched_recordsreturned correctlyAll existing tests continue to pass, ensuring backward compatibility.
Usage Examples
Update by ID (existing behavior)
Update by Metadata Filter (new)
Async Example
gRPC Example
Breaking Changes
None - This is a backward-compatible addition. Existing code using
update(id=...)continues to work unchanged.Validation
idandfilterare mutually exclusive (exactly one must be provided)dry_runonly has meaning whenfilteris providedTesting
Related
This feature aligns with the existing
delete()andfetch_by_metadata()methods which already support metadata filters, providing a consistent API across vector operations.