Skip to content

Integrate ANN into _search endpoint #87625

Closed
@jtibshirani

Description

@jtibshirani

Currently approximate nearest neighbor search is powered by its own _knn_search endpoint. We introduced ANN in a separate endpoint because there were big open questions around how it should integrate with the search API (for example around score combination, aggregations, pagination, etc.) We now have a better understanding on many of these topics and plan to integrate ANN into _search.

Proposed API

ANN will be powered by a new knn section in the search request:

POST products/_search
{
  "query": {
    "multi_match": {
      "query": "black flower dress",
      "fields": ["title", "description"],
      "boost": 0.9
    }
  },
  "knn": {
    "field": "title_vector",
    "query_vector": [0.3, 0.1, ...],
    "k": 5,
    "num_candidates": 50,
    "boost": 0.1
  },
  "size": 20
}

This search finds the global top k=5 vector results, finds all term-based matches, combines them, and finally takes the top 20 top-scoring results. One way to think about it: it's just a normal search, but the knn section adds k additional matching documents based on vector similarity, like taking a boolean disjunction between the query and the ANN search. You can page through search results as usual, in case you've retrieved more interesting vector and term-based matches than fit on one page.

Scoring. In this initial version we'll compute the score of each hit as a weighted sum of the vector and term-based scores. This matches the usual way scores are combined in Elasticsearch, by summing query scores with optional boosts. In the future we can consider other ways of combining scores like reciprocal rank fusion.

Retrieval. The top k vector results represent the global nearest neighbors across all shards. This contrasts with a shard-centric approach, where we would retrieve the k nearest neighbors per shard. In a shard-centric approach, there would actually be k * num_shards matches, which you could see while paging through results, etc. We think the global behavior will be more predictable and easier to understand, especially when searching across multiple indices, where each index could have a different number of shards.

The new knn section would still be marked experimental/ technical preview. Making ANN generally available (GA) is a separate step.

Implementation

A naive implementation would create a bool query on each shard, taking a disjunction between the term-based query and a "knn" query that retrieves k nearest documents. But this would give the shard-centric behavior described above, where there are k * num_shards total vector matches, instead of a global top k.

Our current plan is to perform ANN during an initial search phase that runs before the query phase. Specifically:

  1. In the DFS phase, collect the top k vector matches per shard, combine them and keep the global top k. (DFS is an existing phase that runs before the query phase, and collects the distributed term statistics. We would be adding a new step to the phase.)
  2. Then move to the query phase. Convert the top k vector matches into a special "doc and score" query that matches only those top documents. For the final query, take a boolean disjunction between this "doc and score" query and the term-based query that was passed in.

All other search phases work the same (like fetching hits, field collapsing, etc.) as they just operate on the result of the query phase.

Steps
The feature will be developed on a feature branch.

Follow-ups (can be done after initial merge):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions