Token count field type
A field of type token_count is really an integer field which accepts string values, analyzes them, then indexes the number of tokens in the string.
For instance:
PUT my-index-000001
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{ "name": "John Smith" }
PUT my-index-000001/_doc/2
{ "name": "Rachel Alice Williams" }
GET my-index-000001/_search
{
"query": {
"term": {
"name.length": 3
}
}
}
- The
namefield is atextfield which uses the defaultstandardanalyzer. - The
name.lengthfield is atoken_countmulti-field which will index the number of tokens in thenamefield. - This query matches only the document containing
Rachel Alice Williams, as it contains three tokens.
The following parameters are accepted by token_count fields:
analyzer- The analyzer which should be used to analyze the string value. Required. For best performance, use an analyzer without token filters.
enable_position_increments- Indicates if position increments should be counted. Set to
falseif you don’t want to count tokens removed by analyzer filters (likestop). Defaults totrue. doc_values- Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts
true(default) orfalse. index- Should the field be searchable? Accepts
true(default) andfalse. null_value- Accepts a numeric value of the same
typeas the field which is substituted for any explicitnullvalues. Defaults tonull, which means the field is treated as missing. store- Whether the field value should be stored and retrievable separately from the
_sourcefield. Acceptstrueorfalse(default).
token_count fields support synthetic _source in their default configuration.