Skip to content

Commit f29956e

Browse files
DOC-5777: search: document new SCORERs (#2204)
* DOC-5777: search: document new SCORERs * Update the admin. overview page * Apply suggestions from code review Co-authored-by: andy-stark-redis <[email protected]> * Apply suggestions from code review --------- Co-authored-by: andy-stark-redis <[email protected]>
1 parent 0a9dede commit f29956e

File tree

2 files changed

+41
-5
lines changed

2 files changed

+41
-5
lines changed

content/develop/ai/search-and-query/administration/overview.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -239,9 +239,23 @@ These are the pre-bundled scoring functions available in Redis:
239239
*
240240
Identical to the default TFIDF scorer, with one important distinction:
241241

242-
* **BM25**
242+
* **BM25STD (default)**
243243

244-
A variation on the basic TF-IDF scorer. See [this Wikipedia article for more information](https://en.wikipedia.org/wiki/Okapi_BM25).
244+
A variation on the basic `TFIDF` scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25).
245+
246+
The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`.
247+
248+
{{< note >}}
249+
The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated.
250+
{{< /note >}}
251+
252+
* **BM25STD.NORM**
253+
254+
A variation of `BM25STD`, where the scores are normalized by the minimum and maximum score.
255+
256+
* **BM25STD.TANH**
257+
258+
A variation of `BM25STD.NORM`, where the scores are normalised by linear function `tanh(x)`. `BMSTDSTD.TANH` can take an optional argument, `BM25STD_TANH_FACTOR Y`, which is used to smooth the function and the score values. The default value for `Y` is 4.
245259

246260
* **DISMAX**
247261

content/develop/ai/search-and-query/advanced-concepts/scoring.md

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ weight: 8
1919

2020
When searching, documents are scored based on their relevance to the query. The score is a floating point number between 0.0 and 1.0, where 1.0 is the highest score. The score is returned as part of the search results and can be used to sort the results.
2121

22-
Redis Open Source comes with a few very basic scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/sorting" >}}). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query.
22+
Redis Open Source comes with a few scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/sorting" >}}). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query.
2323

2424
If you prefer a custom scoring function, it is possible to add more functions using the [extension API]({{< relref "/develop/ai/search-and-query/administration/extensions" >}}).
2525

@@ -78,14 +78,36 @@ Term frequencies are normalized by the length of the document, expressed as the
7878
FT.SEARCH myIndex "foo" SCORER TFIDF.DOCNORM
7979
```
8080

81-
## BM25 (default)
81+
## BM25STD (default)
8282

8383
A variation on the basic `TFIDF` scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25).
8484

8585
The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`.
8686

87+
{{< note >}}
88+
The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated.
89+
{{< /note >}}
90+
91+
```
92+
FT.SEARCH myIndex "foo" SCORER BM25STD
93+
```
94+
95+
## BM25STD.NORM
96+
97+
A variation of `BM25STD`, where the scores are normalized by the minimum and maximum scores.
98+
99+
`BM25STD.NORM` uses min–max normalization across the collection, making it more accurate in distinguishing documents when term frequency distributions vary significantly. Because it depends on global statistics, results adapt better to collection-specific characteristics, but this comes at a performance cost: min and max values must be computed and updated whenever the collection changes. This method is recommended when ranking precision is critical and the dataset is relatively stable.
100+
101+
## BM25STD.TANH
102+
103+
A variation of `BM25STD.NORM`, where the scores are normalised by linear function `tanh(x)`. `BMSTDSTD.TANH` can take an optional argument, `BM25STD_TANH_FACTOR Y`, which is used to smooth the function and the score values. The default value for `Y` is 4.
104+
105+
`BM25STD.TANH` applies a smooth transformation using the `tanh(x/factor)` function, which avoids collection-dependent statistics and yields faster, more efficient scoring. While this makes it more scalable and consistent across different datasets, the trade-off is reduced accuracy in cases where min–max normalization provides sharper separation. This method is recommended when performance and throughput are prioritized over fine-grained ranking sensitivity.
106+
107+
Following is an example of how to use `BM25STD_TANH_FACTOR Y` in a query.
108+
87109
```
88-
FT.SEARCH myIndex "foo" SCORER BM25
110+
FT.SEARCH idx "term" SCORER BM25STD.TANH BM25STD_TANH_FACTOR 12 WITHSCORES
89111
```
90112

91113
## DISMAX

0 commit comments

Comments
 (0)