Skip to content

Conversation

@jhamon
Copy link
Collaborator

@jhamon jhamon commented Nov 4, 2025

Add Support for Update by Metadata Filter

Summary

This PR adds support for updating vectors by metadata filter across all API implementations (REST, Asyncio, and gRPC). Previously, the update method only supported updating vectors by ID. Now users can update multiple vectors at once by specifying a metadata filter, similar to how delete and fetch_by_metadata work.

Background

The core OpenAPI layer (pinecone/core/openapi/db_data/model/update_request.py) already supported the filter and dry_run parameters in UpdateRequest, but these were not exposed in the user-facing API. This PR makes these features accessible to users.

Changes

API Updates

  • Added filter parameter: A metadata filter expression that selects which vectors to update
  • Added dry_run parameter: When True, returns the count of matching vectors without executing the update
  • Made id parameter optional: When filter is provided, id must be None
  • Response enhancement: Returns matched_records count when using filter (even without dry_run)

Implementation Details

  1. Interface Definitions (pinecone/db_data/interfaces.py, pinecone/db_data/index_asyncio_interface.py)

    • Updated abstract update() methods to include filter and dry_run parameters
    • Added comprehensive docstrings with examples for both update-by-ID and update-by-filter use cases
  2. REST Implementation (pinecone/db_data/index.py)

    • Added validation to ensure id and filter are mutually exclusive
    • Added validation to ensure at least one of id or filter is provided
    • Updated response handling to include matched_records when present
  3. Asyncio Implementation (pinecone/db_data/index_asyncio.py)

    • Same changes as REST implementation for async support
  4. gRPC Implementation (pinecone/grpc/index_grpc.py)

    • Added filter conversion to google.protobuf.Struct using existing dict_to_proto_struct() utility
    • Updated response parsing to handle matched_records
    • Updated parse_update_response() in pinecone/grpc/utils.py to extract matched_records from responses
  5. Request Factory (pinecone/db_data/request_factory.py)

    • Updated update_request() to explicitly handle filter and dry_run parameters
    • Only includes id in request when it's not None (to support filter-only updates)

Testing

Added comprehensive unit tests covering:

  • Update by ID (backward compatibility verification)
  • Update by filter with metadata
  • Update by filter with values
  • Update by filter with dry_run=True
  • Validation: both id and filter provided (should raise ValueError)
  • Validation: neither id nor filter provided (should raise ValueError)
  • Response handling: matched_records returned correctly
  • gRPC async requests with filter

All existing tests continue to pass, ensuring backward compatibility.

Usage Examples

Update by ID (existing behavior)

# Update a single vector by ID
index.update(id='id1', values=[1, 2, 3], namespace='my_namespace')
index.update(id='id1', set_metadata={'key': 'value'}, namespace='my_namespace')

Update by Metadata Filter (new)

# Update metadata for all vectors matching a filter
index.update(
    filter={'genre': {'$eq': 'comedy'}},
    set_metadata={'status': 'active'},
    namespace='my_namespace'
)

# Preview how many vectors would be updated (dry run)
result = index.update(
    filter={'year': {'$gte': 2020}},
    set_metadata={'updated': True},
    dry_run=True,
    namespace='my_namespace'
)
print(f"Would update {result.get('matched_records', 0)} vectors")

Async Example

import asyncio
from pinecone import Pinecone

async def main():
    pc = Pinecone()
    async with pc.IndexAsyncio(host="your-index-host") as idx:
        # Update by filter
        result = await idx.update(
            filter={'status': 'pending'},
            set_metadata={'status': 'processed'},
            namespace='my_namespace'
        )
        print(f"Updated {result.get('matched_records', 0)} vectors")

asyncio.run(main())

gRPC Example

# Update by filter with async request
future = index.update(
    filter={'genre': {'$eq': 'drama'}},
    set_metadata={'rating': 'high'},
    namespace='my_namespace',
    async_req=True
)
result = future.result()
print(f"Updated {result.get('matched_records', 0)} vectors")

Breaking Changes

None - This is a backward-compatible addition. Existing code using update(id=...) continues to work unchanged.

Validation

  • id and filter are mutually exclusive (exactly one must be provided)
  • dry_run only has meaning when filter is provided
  • All implementations (REST, Asyncio, gRPC) enforce the same validation rules

Testing

  • ✅ All unit tests pass
  • ✅ All gRPC unit tests pass
  • ✅ Backward compatibility verified (existing update-by-ID tests pass)
  • ✅ New functionality tested across all implementations
  • ✅ Edge cases covered (validation errors, empty responses, etc.)

Related

This feature aligns with the existing delete() and fetch_by_metadata() methods which already support metadata filters, providing a consistent API across vector operations.

@jhamon jhamon changed the base branch from main to release-candidate/2025-10 November 4, 2025 12:04
@jhamon jhamon force-pushed the jhamon/update-by-metadata branch from bfad7d1 to efbbfbf Compare November 4, 2025 17:45
@jhamon jhamon force-pushed the jhamon/update-by-metadata branch from ccecbe4 to 413bdd5 Compare November 5, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants