Vector search with binary quantization: Elasticsearch with BBQ is 5x faster than OpenSearch with FAISS. Elastic has received requests from our community to clarify performance differences between Elasticsearch and OpenSearch, particularly in the realm of Semantic Search/Vector Search, so we conducted these performance tests to provide clear, data-driven comparisons.

Binary quantization showdown
Storing high-dimensional vectors in their original form can be memory-intensive. Quantization techniques compress these vectors into a compact representation, drastically reducing the memory footprint. The search then operates in the compressed space, which reduces the computational complexity and makes searches faster, especially in large datasets.
Elastic is committed to making Lucene a top-performing Vector Engine. We introduced Better Binary Quantization (BBQ) in Elasticsearch 8.16 on top of Lucene and evolved it further in 8.18 and 9.0. BBQ is built on a new approach in scalar quantization that reduces float32 dimensions to bits, delivering ~95% memory reduction while maintaining high ranking quality.
OpenSearch on the other hand uses multiple vector engines: nmslib (now deprecated), Lucene and FAISS. In a previous blog, we compared Elasticsearch and OpenSearch for vector search. We used three different datasets and tested different combinations of engines and configurations on both products.
This blog focuses on the binary quantization algorithms currently available in both products. We tested Elasticsearch with BBQ and OpenSearch with FAISS’s Binary Quantization using the openai_vector Rally track.
The main objective was to evaluate the performance of both solutions under the same level of recall. What does recall mean? Recall is a metric that measures how many of the relevant results are successfully retrieved by a search system.
In this evaluation, recall@k is particularly important, where k represents the number of top results considered. Recall@10, Recall@50 and Recall@100 therefore measure how many of the true relevant results appear in the top 10, 50 and 100 retrieved items, respectively. Recall is expressed on a scale from 0 to 1 (or 0% to 100% precision). And that is important because we are talking about Approximate KNN (ANN) and not Exact KNN, where recall is always 1 (100%).
For each value of k we also specified n, which is the number of candidates considered before applying the final ranking. This means that for Recall@10, Recall@50, and Recall@100, the system first retrieves n candidates using the binary quantization algorithm and then ranks them to determine whether the top k results contain the expected relevant items.
By controlling n, we can analyze the trade-off between efficiency and accuracy. A higher n typically increases recall, as more candidates are available for ranking, but it also increases latency and decreases throughput. Conversely, a lower n speeds up retrieval but may reduce recall if too few relevant candidates are included in the initial set.
In this comparison, Elasticsearch demonstrated lower latency and higher throughput than OpenSearch on identical setups.
Methodology
The full configuration, alongside Terraform scripts, Kubernetes manifests and the specific Rally track is available in this repository under openai_vector_bq.
As with previous benchmarks, we used a Kubernetes cluster composed of:
- 1 Node pool for Elasticsearch 9.0 with 3
e2-standard-32
machines (128GB RAM and 32 CPUs) - 1 Node pool for OpenSearch 2.19 with 3
e2-standard-32
machines (128GB RAM and 32 CPUs) - 1 Node pool for Rally with 2
e2-standard-4
machines (16GB RAM and 4 CPUs)

We set up one Elasticsearch cluster version 9.0 and one OpenSearch cluster version 2.19.
Both Elasticsearch and OpenSearch were tested with the exact same setup: we used openai_vector Rally track with some modifications - which uses 2.5 million documents from the NQ data set enriched with embeddings generated using OpenAI's text-embedding-ada-002 model.
The results report on measured latency and throughput at different recall levels (recall@10, recall@50 and recall@100) using 8 simultaneous clients for performing search operations. We used a single shard and no replicas.
We ran the following combinations of k-n-rescore, e.g. 10-2000-2000, or k:10, n:2000 and rescore:2000 would retrieve the top k (10) over n candidates (2000) applying a rescore over 2000 results (which is equivalent of an “oversample factor” of 1). Each search ran for 10.000 times with 1000 searches as warmup:
Recall@10
- 10-40-40
- 10-50-50
- 10-100-100
- 10-200-200
- 10-500-500
- 10-750-750
- 10-1000-1000
- 10-1500-1500
- 10-2000-2000
Recall@50
- 50-150-150
- 50-200-200
- 50-250-250
- 50-500-500
- 50-750-750
- 50-1000-1000
- 50-1200-1200
- 50-1500-1500
- 50-2000-2000
Recall@100
- 100-200-200
- 100-250-250
- 100-300-300
- 100-500-500
- 100-750-750
- 100-1000-1000
- 100-1200-1200
- 100-1500-1500
- 100-2000-2000
To replicate the benchmark, the Kubernetes manifests for both rally-elasticsearch and rally-opensearch have all the relevant variables externalized in a ConfigMap, available here (ES) and here (OS). The search_ops parameter can be customized to test any combination of k, n and rescore.
OpenSearch Rally configuration
/k8s/rally-openai_vector-os-bq.yml
Opensearch index configuration
The variables from the ConfigMap are then used on the index configuration, some parameters are left unchanged. 1-bit quantization in OpenSearch is configured by setting the compression level to “32x”.
index-vectors-only-mapping-with-docid-mapping.json
Elasticsearch Rally configuration
/k8s/rally-openai_vector-es-bq.yml
Elasticsearch index configuration
index-vectors-only-mapping-with-docid-mapping.json
Results
There are multiple ways to interpret the results. For both latency and throughput, we plotted a simplified and a detailed chart at each level of recall. It’s easy to see differences if we consider “higher is better” for each metric. However, latency is a negative one (lower is actually better), while throughput is a positive one. For the simplified charts, we used (recall / latency) * 10000 (called simply “speed”) and recall * throughput, so both metrics mean more speed and more throughput are better. Let’s get to it.
Recall @ 10 - simplified
At that level of recall Elasticsearch BBQ is up to 5x faster (3.9x faster on average) and has 3.2x more throughput on average than OpenSearch FAISS.


Recall @ 10 - Detailed


task | latency.mean | throughput.mean | avg_recall | |
---|---|---|---|---|
Elasticsearch-9.0-BBQ | 10-100-100 | 11.70 | 513.58 | 0.89 |
Elasticsearch-9.0-BBQ | 10-1000-100 | 27.33 | 250.55 | 0.95 |
Elasticsearch-9.0-BBQ | 10-1500-1500 | 35.93 | 197.26 | 0.95 |
Elasticsearch-9.0-BBQ | 10-200-200 | 13.33 | 456.16 | 0.92 |
Elasticsearch-9.0-BBQ | 10-2000-2000 | 44.27 | 161.40 | 0.95 |
Elasticsearch-9.0-BBQ | 10-40-40 | 10.97 | 539.94 | 0.84 |
Elasticsearch-9.0-BBQ | 10-50-50 | 11.00 | 535.73 | 0.85 |
Elasticsearch-9.0-BBQ | 10-500-500 | 19.52 | 341.45 | 0.93 |
Elasticsearch-9.0-BBQ | 10-750-750 | 22.94 | 295.19 | 0.94 |
OpenSearch-2.19-faiss | 10-100-100 | 35.59 | 200.61 | 0.94 |
OpenSearch-2.19-faiss | 10-1000-1000 | 156.81 | 58.30 | 0.96 |
OpenSearch-2.19-faiss | 10-1500-1500 | 181.79 | 42.97 | 0.96 |
OpenSearch-2.19-faiss | 10-200-200 | 47.91 | 155.16 | 0.95 |
OpenSearch-2.19-faiss | 10-2000-2000 | 232.14 | 31.84 | 0.96 |
OpenSearch-2.19-faiss | 10-40-40 | 27.55 | 249.25 | 0.92 |
OpenSearch-2.19-faiss | 10-50-50 | 28.78 | 245.14 | 0.92 |
OpenSearch-2.19-faiss | 10-500-500 | 79.44 | 97.06 | 0.96 |
OpenSearch-2.19-faiss | 10-750-750 | 104.19 | 75.49 | 0.96 |
Recall @ 50 - simplified
At that level of recall Elasticsearch BBQ is up to 5x faster (4.2x faster on average) and has 3.9x more throughput on average than OpenSearch FAISS.


Detailed Results - Recall @ 50


Task | Latency Mean | Throughput Mean | Avg Recall | |
---|---|---|---|---|
Elasticsearch-9.0-BBQ | 50-1000-1000 | 25.71 | 246.44 | 0.95 |
Elasticsearch-9.0-BBQ | 50-1200-1200 | 28.81 | 227.85 | 0.95 |
Elasticsearch-9.0-BBQ | 50-150-150 | 13.43 | 362.90 | 0.90 |
Elasticsearch-9.0-BBQ | 50-1500-1500 | 33.38 | 202.37 | 0.95 |
Elasticsearch-9.0-BBQ | 50-200-200 | 12.99 | 406.30 | 0.91 |
Elasticsearch-9.0-BBQ | 50-2000-2000 | 42.63 | 163.68 | 0.95 |
Elasticsearch-9.0-BBQ | 50-250-250 | 14.41 | 373.21 | 0.92 |
Elasticsearch-9.0-BBQ | 50-500-500 | 17.15 | 341.04 | 0.93 |
Elasticsearch-9.0-BBQ | 50-750-750 | 31.25 | 248.60 | 0.94 |
OpenSearch-2.19-faiss | 50-1000-1000 | 125.35 | 62.53 | 0.96 |
OpenSearch-2.19-faiss | 50-1200-1200 | 143.87 | 54.75 | 0.96 |
OpenSearch-2.19-faiss | 50-150-150 | 43.64 | 130.01 | 0.89 |
OpenSearch-2.19-faiss | 50-1500-1500 | 169.45 | 46.35 | 0.96 |
OpenSearch-2.19-faiss | 50-200-200 | 48.05 | 156.07 | 0.91 |
OpenSearch-2.19-faiss | 50-2000-2000 | 216.73 | 36.38 | 0.96 |
OpenSearch-2.19-faiss | 50-250-250 | 53.52 | 142.44 | 0.93 |
OpenSearch-2.19-faiss | 50-500-500 | 78.98 | 97.82 | 0.95 |
OpenSearch-2.19-faiss | 50-750-750 | 103.20 | 75.86 | 0.96 |
Recall @ 100
At that level of recall Elasticsearch BBQ is up to 5x faster (average 4.6x faster) and has 3.9x more throughput on average than OpenSearch FAISS.


Detailed Results - Recall @ 100


task | latency.mean | throughput.mean | avg_recall | |
---|---|---|---|---|
Elasticsearch-9.0-BBQ | 100-1000-1000 | 27.82 | 243.22 | 0.95 |
Elasticsearch-9.0-BBQ | 100-1200-1200 | 31.14 | 224.04 | 0.95 |
Elasticsearch-9.0-BBQ | 100-1500-1500 | 35.98 | 193.99 | 0.95 |
Elasticsearch-9.0-BBQ | 100-200-200 | 14.18 | 403.86 | 0.88 |
Elasticsearch-9.0-BBQ | 100-2000-2000 | 45.36 | 159.88 | 0.95 |
Elasticsearch-9.0-BBQ | 100-250-250 | 14.77 | 433.06 | 0.90 |
Elasticsearch-9.0-BBQ | 100-300-300 | 14.61 | 375.54 | 0.91 |
Elasticsearch-9.0-BBQ | 100-500-500 | 18.88 | 340.37 | 0.93 |
Elasticsearch-9.0-BBQ | 100-750-750 | 23.59 | 285.79 | 0.94 |
OpenSearch-2.19-faiss | 100-1000-1000 | 142.90 | 58.48 | 0.95 |
OpenSearch-2.19-faiss | 100-1200-1200 | 153.03 | 51.04 | 0.95 |
OpenSearch-2.19-faiss | 100-1500-1500 | 181.79 | 43.20 | 0.96 |
OpenSearch-2.19-faiss | 100-200-200 | 50.94 | 131.62 | 0.83 |
OpenSearch-2.19-faiss | 100-2000-2000 | 232.53 | 33.67 | 0.96 |
OpenSearch-2.19-faiss | 100-250-250 | 57.08 | 131.23 | 0.87 |
OpenSearch-2.19-faiss | 100-300-300 | 62.76 | 120.10 | 0.89 |
OpenSearch-2.19-faiss | 100-500-500 | 84.36 | 91.54 | 0.93 |
OpenSearch-2.19-faiss | 100-750-750 | 111.33 | 69.95 | 0.94 |
Improvements on BBQ
BBQ has come a long way since its first release. On Elasticsearch 8.16, for the sake of comparison, we included a benchmark run from 8.16 alongside the current one, and we can see how recall and latency have improved since then.

In Elasticsearch 8.18 and 9.0, we rewrote the core algorithm for quantizing the vectors. So, while BBQ in 8.16 was good, the newest versions are even better. You can read about it here and here. In short, every vector is individually quantized through optimized scalar quantiles. As a result, users benefit from higher accuracy in vector search without compromising performance, making Elasticsearch’s vector retrieval even more powerful.
Conclusion
In this performance comparison between Elasticsearch BBQ and OpenSearch FAISS, Elasticsearch significantly outperforms OpenSearch for vector search, achieving up to 5x faster query speeds and 3.9x higher throughput on average across various levels of recall.
Key findings include:
- Recall@10: Elasticsearch BBQ is up to 5x faster (3.9x faster on average) and has 3.2x more throughput on average compared to OpenSearch FAISS.
- Recall@50: Elasticsearch BBQ is up to 5x faster (4.2x faster on average) and has 3.9x more throughput on average compared to OpenSearch FAISS.
- Recall@100: Elasticsearch BBQ is up to 5x faster (4.6x faster on average) and has 3.9x more throughput on average compared to OpenSearch FAISS.
These results highlight the efficiency and performance advantages of Elasticsearch BBQ, particularly in high-dimensional vector search scenarios. The Better Binary Quantization (BBQ) technique, introduced in Elasticsearch 8.16, provides substantial memory reduction (~95%) while maintaining high ranking quality, making it a superior choice for large-scale vector search applications.
At Elastic, we are relentlessly innovating to improve Apache Lucene and Elasticsearch to provide the best vector database for search and retrieval use cases, including RAG (Retrieval Augmented Generation). Our recent advancements have dramatically increased performance, making vector search faster and more space efficient than before, building upon the gains from Lucene 10. This blog is another illustration of that innovation.
Try out vector search for yourself using this self-paced hands-on learning for Search AI. You can start a free cloud trial or try Elastic on your local machine now.