Character filter reference
Character filters are used to preprocess the stream of characters before it is passed to the tokenizer.
A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like <b> from the stream.
Elasticsearch has a number of built in character filters which can be used to build custom analyzers.
- HTML Strip Character Filter
- The
html_stripcharacter filter strips out HTML elements like<b>and decodes HTML entities like&. - Mapping Character Filter
- The
mappingcharacter filter replaces any occurrences of the specified strings with the specified replacements. - Pattern Replace Character Filter
-
The
pattern_replacecharacter filter replaces any characters matching a regular expression with the specified replacement.