Skip to content

Use optimized text in match_only_text fields #129371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 17, 2025

Conversation

jordan-powers
Copy link
Contributor

@jordan-powers jordan-powers commented Jun 12, 2025

Follow-up to #126492 to use the json parsing optimizations for match_only_text fields.

Relates to #129072.

@jordan-powers jordan-powers requested a review from martijnvg June 12, 2025 21:52
@jordan-powers jordan-powers self-assigned this Jun 12, 2025
@jordan-powers jordan-powers requested a review from a team as a code owner June 12, 2025 21:52
@jordan-powers jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 labels Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Contributor Author

@jordan-powers jordan-powers Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically equivalent to:

final var reader = new InputStreamReader(new ByteArrayInputStream(utfBytes.bytes(), utfBytes.offset(), utfBytes.length()), StandardCharsets.UTF_8);

But according to the microbenchmarks, using the InputStreamReader/ByteArrayInputStream is really slow, and was actually slower than the original string-based implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the results from the microbenchmark:

Benchmark                                                  (nDocs)  Mode  Cnt    Score   Error  Units
OptimizedTextBenchmark.indexDocuments (baseline)           1048576  avgt    5  581.242 ± 3.050  ms/op
OptimizedTextBenchmark.indexDocuments (UTF8DecodingReader) 1048576  avgt    5  544.477 ± 4.961  ms/op
OptimizedTextBenchmark.indexDocuments (InputStreamReader)  1048576  avgt    5  852.380 ± 6.238  ms/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet, but I plan to today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a final class?

@jordan-powers jordan-powers changed the title Text fields optimized text Use optimized text in match_only_text fields Jun 12, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a few questions.

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a final class?

@martijnvg
Copy link
Member

Looks like serverless check failed, because of:

[2025-06-13T22:21:59,803][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search-TgDYeLu] fatal error in thread [elasticsearch[search-TgDYeLu][esql_worker][T#95]], exiting
java.lang.AssertionError: java.lang.IllegalStateException: can't release already released object [IntArrayBlock[positions=35, mvOrdering=UNORDERED, vector=IntArrayVector[positions=35, values=[5, 3, 0, 1, 5, 4, 5, 5, 5, 5, ...]]]]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:64) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator$AppendBlocksIterator.close(AbstractPageMappingToIteratorOperator.java:335) ~[?:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator$RuntimeTrackingIterator.close(AbstractPageMappingToIteratorOperator.java:161) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator.close(AbstractPageMappingToIteratorOperator.java:136) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.closeEarlyFinishedOperators(Driver.java:317) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:290) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.run(Driver.java:185) ~[?:?]
	at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:406) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.DriverScheduler$1.doRun(DriverScheduler.java:57) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619) ~[?:?]
	at java.lang.Thread.run(Thread.java:1447) ~[?:?]
Caused by: java.lang.IllegalStateException: can't release already released object [IntArrayBlock[positions=35, mvOrdering=UNORDERED, vector=IntArrayVector[positions=35, values=[5, 3, 0, 1, 5, 4, 5, 5, 5, 5, ...]]]]
	at org.elasticsearch.compute.data.AbstractNonThreadSafeRefCounted.decRef(AbstractNonThreadSafeRefCounted.java:40) ~[?:?]
	at org.elasticsearch.compute.data.AbstractNonThreadSafeRefCounted.close(AbstractNonThreadSafeRefCounted.java:59) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:48) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.data.IntLookup.close(IntLookup.java:96) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:48) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:62) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	... 18 more

Which killed the search node and many tests failed because of this. I don't think it is related to this change.

@jordan-powers
Copy link
Contributor Author

Ok, I benchmarked this with the elastic/logs, and it doesn't seem to have sped it up very much. If anything, it slowed down the throughput by 1.5%.

In light of that, I'm not sure if we want to continue forward with this change.

|                                                        Metric |                                   Task |        Baseline |       Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------------------------:|----------------:|----------------:|------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                        |   843.881       |   845.464       |     1.58262 |    min |   +0.19% |
|             Min cumulative indexing time across primary shard |                                        |     2.76688     |     2.80793     |     0.04105 |    min |   +1.48% |
|          Median cumulative indexing time across primary shard |                                        |    10.8209      |    10.6022      |    -0.21865 |    min |   -2.02% |
|             Max cumulative indexing time across primary shard |                                        |   155.056       |   148.468       |    -6.58773 |    min |   -4.25% |
|           Cumulative indexing throttle time of primary shards |                                        |     0           |     0           |     0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                        |   271.282       |   282.542       |    11.26    |    min |   +4.15% |
|                      Cumulative merge count of primary shards |                                        |   452           |   447           |    -5       |        |   -1.11% |
|                Min cumulative merge time across primary shard |                                        |     0.421517    |     0.507383    |     0.08587 |    min |  +20.37% |
|             Median cumulative merge time across primary shard |                                        |     1.9479      |     2.5712      |     0.6233  |    min |  +32.00% |
|                Max cumulative merge time across primary shard |                                        |    60.6158      |    65.9096      |     5.29378 |    min |   +8.73% |
|              Cumulative merge throttle time of primary shards |                                        |    83.5016      |    85.6329      |     2.13128 |    min |   +2.55% |
|       Min cumulative merge throttle time across primary shard |                                        |     0.0956      |     0.100017    |     0.00442 |    min |   +4.62% |
|    Median cumulative merge throttle time across primary shard |                                        |     0.453217    |     0.591433    |     0.13822 |    min |  +30.50% |
|       Max cumulative merge throttle time across primary shard |                                        |    17.9542      |    20.317       |     2.36283 |    min |  +13.16% |
|                     Cumulative refresh time of primary shards |                                        |    12.0246      |    11.3049      |    -0.71975 |    min |   -5.99% |
|                    Cumulative refresh count of primary shards |                                        |  6773           |  6487           |  -286       |        |   -4.22% |
|              Min cumulative refresh time across primary shard |                                        |     0.0152333   |     0.0136      |    -0.00163 |    min |  -10.72% |
|           Median cumulative refresh time across primary shard |                                        |     0.0656667   |     0.1037      |     0.03803 |    min |  +57.92% |
|              Max cumulative refresh time across primary shard |                                        |     3.19272     |     2.74677     |    -0.44595 |    min |  -13.97% |
|                       Cumulative flush time of primary shards |                                        |   164.641       |   158.741       |    -5.90037 |    min |   -3.58% |
|                      Cumulative flush count of primary shards |                                        |  6375           |  6025           |  -350       |        |   -5.49% |
|                Min cumulative flush time across primary shard |                                        |     0.60075     |     0.605083    |     0.00433 |    min |   +0.72% |
|             Median cumulative flush time across primary shard |                                        |     2.14908     |     2.19497     |     0.04588 |    min |   +2.14% |
|                Max cumulative flush time across primary shard |                                        |    27.8946      |    25.8562      |    -2.03843 |    min |   -7.31% |
|                                       Total Young Gen GC time |                                        |   304.427       |   312.299       |     7.872   |      s |   +2.59% |
|                                      Total Young Gen GC count |                                        |  9128           |  9292           |   164       |        |   +1.80% |
|                                         Total Old Gen GC time |                                        |     0           |     0           |     0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                        |     0           |     0           |     0       |        |    0.00% |
|                                                  Dataset size |                                        |    66.4907      |    66.1832      |    -0.30756 |     GB |   -0.46% |
|                                                    Store size |                                        |    66.4907      |    66.1832      |    -0.30756 |     GB |   -0.46% |
|                                                 Translog size |                                        |     1.85905     |     4.093       |     2.23396 |     GB | +120.17% |
|                                        Heap used for segments |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                      Heap used for doc values |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                           Heap used for terms |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                           Heap used for norms |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                          Heap used for points |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                                 Segment count |                                        |  1405           |  1496           |    91       |        |   +6.48% |
|                                   Total Ingest Pipeline count |                                        |     4.88622e+08 |     4.88622e+08 |     0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                        |     2.86538e+07 |     2.87425e+07 | 88671       |     ms |   +0.31% |
|                                  Total Ingest Pipeline failed |                                        |     0           |     0           |     0       |        |    0.00% |
|                                                Min Throughput |                             bulk-index |  1051.65        |   955.535       |   -96.119   | docs/s |   -9.14% |
|                                               Mean Throughput |                             bulk-index | 33868.7         | 33389.5         |  -479.148   | docs/s |   -1.41% |
|                                             Median Throughput |                             bulk-index | 33915.6         | 33412.8         |  -502.861   | docs/s |   -1.48% |
|                                                Max Throughput |                             bulk-index | 37093.8         | 36478.7         |  -615.09    | docs/s |   -1.66% |
|                                       50th percentile latency |                             bulk-index |  1530.83        |  1536.39        |     5.55938 |     ms |   +0.36% |
|                                       90th percentile latency |                             bulk-index |  2714.8         |  2728.48        |    13.6765  |     ms |   +0.50% |
|                                       99th percentile latency |                             bulk-index |  4725.48        |  4777.42        |    51.9447  |     ms |   +1.10% |
|                                     99.9th percentile latency |                             bulk-index |  8483.2         |  8272.94        |  -210.258   |     ms |   -2.48% |
|                                    99.99th percentile latency |                             bulk-index | 11502.3         | 11182.9         |  -319.377   |     ms |   -2.78% |
|                                      100th percentile latency |                             bulk-index | 16531.9         | 17475.5         |   943.6     |     ms |   +5.71% |
|                                  50th percentile service time |                             bulk-index |  1531.29        |  1537.26        |     5.96548 |     ms |   +0.39% |
|                                  90th percentile service time |                             bulk-index |  2714.72        |  2727.99        |    13.2743  |     ms |   +0.49% |
|                                  99th percentile service time |                             bulk-index |  4694.21        |  4784.81        |    90.6033  |     ms |   +1.93% |
|                                99.9th percentile service time |                             bulk-index |  8482.51        |  8262.62        |  -219.887   |     ms |   -2.59% |
|                               99.99th percentile service time |                             bulk-index | 11505.4         | 11157.4         |  -347.979   |     ms |   -3.02% |
|                                 100th percentile service time |                             bulk-index | 16531.9         | 17475.5         |   943.6     |     ms |   +5.71% |
|                                                    error rate |                             bulk-index |     0           |     0           |     0       |      % |    0.00% |

@martijnvg
Copy link
Member

If anything, it slowed down the throughput by 1.5%.

Given that even median indexing throughput varies, I think that this is within noise. I don't think this change makes things slower. I suspect we don't see a gain here, given that that are many other moving parts here. The micro benchmark suggested a minor improvement. I think not having to create java strings during indexing for match only fields is a beneficial. Additionally block loaders for match only text fields can work with bytes refs instead of strings, which I think is beneficial as well.

In light of that, I'm not sure if we want to continue forward with this change.

So I think we can move forward with the change and merge it in.

@jordan-powers
Copy link
Contributor Author

Ok, I'll merge it and keep an eye on the nightlies. We can always revert it if we see something we don't like.

@jordan-powers jordan-powers merged commit 5d19997 into elastic:main Jun 17, 2025
24 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19

elasticsearchmachine pushed a commit that referenced this pull request Jun 18, 2025
Follow-up to #126492 to use the json parsing optimizations for
match_only_text fields.

Relates to #129072.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants