Elasticsearch BBQ vs. OpenSearch FAISS: Vector search performance comparison

Vector search with binary quantization: Elasticsearch with BBQ is 5x faster than OpenSearch with FAISS. Elastic has received requests from our community to clarify performance differences between Elasticsearch and OpenSearch, particularly in the realm of Semantic Search/Vector Search, so we conducted these performance tests to provide clear, data-driven comparisons.

Binary quantization showdown

Storing high-dimensional vectors in their original form can be memory-intensive. Quantization techniques compress these vectors into a compact representation, drastically reducing the memory footprint. The search then operates in the compressed space, which reduces the computational complexity and makes searches faster, especially in large datasets.

Elastic is committed to making Lucene a top-performing Vector Engine. We introduced Better Binary Quantization (BBQ) in Elasticsearch 8.16 on top of Lucene and evolved it further in 8.18 and 9.0. BBQ is built on a new approach in scalar quantization that reduces float32 dimensions to bits, delivering ~95% memory reduction while maintaining high ranking quality.

OpenSearch on the other hand uses multiple vector engines: nmslib (now deprecated), Lucene and FAISS. In a previous blog, we compared Elasticsearch and OpenSearch for vector search. We used three different datasets and tested different combinations of engines and configurations on both products.

This blog focuses on the binary quantization algorithms currently available in both products. We tested Elasticsearch with BBQ and OpenSearch with FAISS’s Binary Quantization using the openai_vector Rally track.

The main objective was to evaluate the performance of both solutions under the same level of recall. What does recall mean? Recall is a metric that measures how many of the relevant results are successfully retrieved by a search system.

In this evaluation, recall@k is particularly important, where k represents the number of top results considered. Recall@10, Recall@50 and Recall@100 therefore measure how many of the true relevant results appear in the top 10, 50 and 100 retrieved items, respectively. Recall is expressed on a scale from 0 to 1 (or 0% to 100% precision). And that is important because we are talking about Approximate KNN (ANN) and not Exact KNN, where recall is always 1 (100%).

For each value of k we also specified n, which is the number of candidates considered before applying the final ranking. This means that for Recall@10, Recall@50, and Recall@100, the system first retrieves n candidates using the binary quantization algorithm and then ranks them to determine whether the top k results contain the expected relevant items.

By controlling n, we can analyze the trade-off between efficiency and accuracy. A higher n typically increases recall, as more candidates are available for ranking, but it also increases latency and decreases throughput. Conversely, a lower n speeds up retrieval but may reduce recall if too few relevant candidates are included in the initial set.

In this comparison, Elasticsearch demonstrated lower latency and higher throughput than OpenSearch on identical setups.

Methodology

The full configuration, alongside Terraform scripts, Kubernetes manifests and the specific Rally track is available in this repository under openai_vector_bq.

As with previous benchmarks, we used a Kubernetes cluster composed of:

1 Node pool for Elasticsearch 9.0 with 3 e2-standard-32 machines (128GB RAM and 32 CPUs)
1 Node pool for OpenSearch 2.19 with 3 e2-standard-32 machines (128GB RAM and 32 CPUs)
1 Node pool for Rally with 2 e2-standard-4 machines (16GB RAM and 4 CPUs)

We set up one Elasticsearch cluster version 9.0 and one OpenSearch cluster version 2.19.

Both Elasticsearch and OpenSearch were tested with the exact same setup: we used openai_vector Rally track with some modifications - which uses 2.5 million documents from the NQ data set enriched with embeddings generated using OpenAI's text-embedding-ada-002 model.

The results report on measured latency and throughput at different recall levels (recall@10, recall@50 and recall@100) using 8 simultaneous clients for performing search operations. We used a single shard and no replicas.

We ran the following combinations of k-n-rescore, e.g. 10-2000-2000, or k:10, n:2000 and rescore:2000 would retrieve the top k (10) over n candidates (2000) applying a rescore over 2000 results (which is equivalent of an “oversample factor” of 1). Each search ran for 10.000 times with 1000 searches as warmup:

Recall@10

10-40-40
10-50-50
10-100-100
10-200-200
10-500-500
10-750-750
10-1000-1000
10-1500-1500
10-2000-2000

Recall@50

50-150-150
50-200-200
50-250-250
50-500-500
50-750-750
50-1000-1000
50-1200-1200
50-1500-1500
50-2000-2000

Recall@100

100-200-200
100-250-250
100-300-300
100-500-500
100-750-750
100-1000-1000
100-1200-1200
100-1500-1500
100-2000-2000

To replicate the benchmark, the Kubernetes manifests for both rally-elasticsearch and rally-opensearch have all the relevant variables externalized in a ConfigMap, available here (ES) and here (OS). The search_ops parameter can be customized to test any combination of k, n and rescore.

OpenSearch Rally configuration

/k8s/rally-openai_vector-os-bq.yml

Opensearch index configuration

The variables from the ConfigMap are then used on the index configuration, some parameters are left unchanged. 1-bit quantization in OpenSearch is configured by setting the compression level to “32x”.

index-vectors-only-mapping-with-docid-mapping.json

Elasticsearch Rally configuration

/k8s/rally-openai_vector-es-bq.yml

Elasticsearch index configuration

index-vectors-only-mapping-with-docid-mapping.json

Results

There are multiple ways to interpret the results. For both latency and throughput, we plotted a simplified and a detailed chart at each level of recall. It’s easy to see differences if we consider “higher is better” for each metric. However, latency is a negative one (lower is actually better), while throughput is a positive one. For the simplified charts, we used (recall / latency) * 10000 (called simply “speed”) and recall * throughput, so both metrics mean more speed and more throughput are better. Let’s get to it.

Recall @ 10 - simplified

At that level of recall Elasticsearch BBQ is up to 5x faster (3.9x faster on average) and has 3.2x more throughput on average than OpenSearch FAISS.

Recall @ 10 - Detailed

	task	latency.mean	throughput.mean	avg_recall
Elasticsearch-9.0-BBQ	10-100-100	11.70	513.58	0.89
Elasticsearch-9.0-BBQ	10-1000-100	27.33	250.55	0.95
Elasticsearch-9.0-BBQ	10-1500-1500	35.93	197.26	0.95
Elasticsearch-9.0-BBQ	10-200-200	13.33	456.16	0.92
Elasticsearch-9.0-BBQ	10-2000-2000	44.27	161.40	0.95
Elasticsearch-9.0-BBQ	10-40-40	10.97	539.94	0.84
Elasticsearch-9.0-BBQ	10-50-50	11.00	535.73	0.85
Elasticsearch-9.0-BBQ	10-500-500	19.52	341.45	0.93
Elasticsearch-9.0-BBQ	10-750-750	22.94	295.19	0.94
OpenSearch-2.19-faiss	10-100-100	35.59	200.61	0.94
OpenSearch-2.19-faiss	10-1000-1000	156.81	58.30	0.96
OpenSearch-2.19-faiss	10-1500-1500	181.79	42.97	0.96
OpenSearch-2.19-faiss	10-200-200	47.91	155.16	0.95
OpenSearch-2.19-faiss	10-2000-2000	232.14	31.84	0.96
OpenSearch-2.19-faiss	10-40-40	27.55	249.25	0.92
OpenSearch-2.19-faiss	10-50-50	28.78	245.14	0.92
OpenSearch-2.19-faiss	10-500-500	79.44	97.06	0.96
OpenSearch-2.19-faiss	10-750-750	104.19	75.49	0.96

Recall @ 50 - simplified

At that level of recall Elasticsearch BBQ is up to 5x faster (4.2x faster on average) and has 3.9x more throughput on average than OpenSearch FAISS.

Detailed Results - Recall @ 50

	Task	Latency Mean	Throughput Mean	Avg Recall
Elasticsearch-9.0-BBQ	50-1000-1000	25.71	246.44	0.95
Elasticsearch-9.0-BBQ	50-1200-1200	28.81	227.85	0.95
Elasticsearch-9.0-BBQ	50-150-150	13.43	362.90	0.90
Elasticsearch-9.0-BBQ	50-1500-1500	33.38	202.37	0.95
Elasticsearch-9.0-BBQ	50-200-200	12.99	406.30	0.91
Elasticsearch-9.0-BBQ	50-2000-2000	42.63	163.68	0.95
Elasticsearch-9.0-BBQ	50-250-250	14.41	373.21	0.92
Elasticsearch-9.0-BBQ	50-500-500	17.15	341.04	0.93
Elasticsearch-9.0-BBQ	50-750-750	31.25	248.60	0.94
OpenSearch-2.19-faiss	50-1000-1000	125.35	62.53	0.96
OpenSearch-2.19-faiss	50-1200-1200	143.87	54.75	0.96
OpenSearch-2.19-faiss	50-150-150	43.64	130.01	0.89
OpenSearch-2.19-faiss	50-1500-1500	169.45	46.35	0.96
OpenSearch-2.19-faiss	50-200-200	48.05	156.07	0.91
OpenSearch-2.19-faiss	50-2000-2000	216.73	36.38	0.96
OpenSearch-2.19-faiss	50-250-250	53.52	142.44	0.93
OpenSearch-2.19-faiss	50-500-500	78.98	97.82	0.95
OpenSearch-2.19-faiss	50-750-750	103.20	75.86	0.96

Recall @ 100

At that level of recall Elasticsearch BBQ is up to 5x faster (average 4.6x faster) and has 3.9x more throughput on average than OpenSearch FAISS.

Detailed Results - Recall @ 100

	task	latency.mean	throughput.mean	avg_recall
Elasticsearch-9.0-BBQ	100-1000-1000	27.82	243.22	0.95
Elasticsearch-9.0-BBQ	100-1200-1200	31.14	224.04	0.95
Elasticsearch-9.0-BBQ	100-1500-1500	35.98	193.99	0.95
Elasticsearch-9.0-BBQ	100-200-200	14.18	403.86	0.88
Elasticsearch-9.0-BBQ	100-2000-2000	45.36	159.88	0.95
Elasticsearch-9.0-BBQ	100-250-250	14.77	433.06	0.90
Elasticsearch-9.0-BBQ	100-300-300	14.61	375.54	0.91
Elasticsearch-9.0-BBQ	100-500-500	18.88	340.37	0.93
Elasticsearch-9.0-BBQ	100-750-750	23.59	285.79	0.94
OpenSearch-2.19-faiss	100-1000-1000	142.90	58.48	0.95
OpenSearch-2.19-faiss	100-1200-1200	153.03	51.04	0.95
OpenSearch-2.19-faiss	100-1500-1500	181.79	43.20	0.96
OpenSearch-2.19-faiss	100-200-200	50.94	131.62	0.83
OpenSearch-2.19-faiss	100-2000-2000	232.53	33.67	0.96
OpenSearch-2.19-faiss	100-250-250	57.08	131.23	0.87
OpenSearch-2.19-faiss	100-300-300	62.76	120.10	0.89
OpenSearch-2.19-faiss	100-500-500	84.36	91.54	0.93
OpenSearch-2.19-faiss	100-750-750	111.33	69.95	0.94

Improvements on BBQ

BBQ has come a long way since its first release. On Elasticsearch 8.16, for the sake of comparison, we included a benchmark run from 8.16 alongside the current one, and we can see how recall and latency have improved since then.

In Elasticsearch 8.18 and 9.0, we rewrote the core algorithm for quantizing the vectors. So, while BBQ in 8.16 was good, the newest versions are even better. You can read about it here and here. In short, every vector is individually quantized through optimized scalar quantiles. As a result, users benefit from higher accuracy in vector search without compromising performance, making Elasticsearch’s vector retrieval even more powerful.

Conclusion

In this performance comparison between Elasticsearch BBQ and OpenSearch FAISS, Elasticsearch significantly outperforms OpenSearch for vector search, achieving up to 5x faster query speeds and 3.9x higher throughput on average across various levels of recall.

Key findings include:

Recall@10: Elasticsearch BBQ is up to 5x faster (3.9x faster on average) and has 3.2x more throughput on average compared to OpenSearch FAISS.
Recall@50: Elasticsearch BBQ is up to 5x faster (4.2x faster on average) and has 3.9x more throughput on average compared to OpenSearch FAISS.
Recall@100: Elasticsearch BBQ is up to 5x faster (4.6x faster on average) and has 3.9x more throughput on average compared to OpenSearch FAISS.

These results highlight the efficiency and performance advantages of Elasticsearch BBQ, particularly in high-dimensional vector search scenarios. The Better Binary Quantization (BBQ) technique, introduced in Elasticsearch 8.16, provides substantial memory reduction (~95%) while maintaining high ranking quality, making it a superior choice for large-scale vector search applications.

At Elastic, we are relentlessly innovating to improve Apache Lucene and Elasticsearch to provide the best vector database for search and retrieval use cases, including RAG (Retrieval Augmented Generation). Our recent advancements have dramatically increased performance, making vector search faster and more space efficient than before, building upon the gains from Lucene 10. This blog is another illustration of that innovation.

Try out vector search for yourself using this self-paced hands-on learning for Search AI. You can start a free cloud trial or try Elastic on your local machine now.

Report an issue