Skip to content

Commit 32eb5ff

Browse files
committed
[Docs] Document which encoding should be used in order to make sense of the offsets returned by the term vectors API.
Close elastic#4363
1 parent a1d4731 commit 32eb5ff

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/reference/search/termvectors.asciidoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,14 @@ If the requested information wasn't stored in the index, it will be
4141
omitted without further warning. See <<mapping-types,type mapping>>
4242
for how to configure your index to store term vectors.
4343

44+
[WARNING]
45+
======
46+
Start and end offsets assume UTF-16 encoding is being used. If you want to use
47+
these offsets in order to get the original text that produced this token, you
48+
should make sure that the string you are taking a sub-string of is also encoded
49+
using UTF-16.
50+
======
51+
4452
[float]
4553
==== Term statistics
4654

0 commit comments

Comments
 (0)