-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Avoid dropping aggregate groupings in local plans #129370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a first look at it, but I need more time to understand it.
It seems to be doing what the comment in RuleUtils.aliasedNulls is suggesting, which is good.
When I tested this I really had to use "max_concurrent_shards_per_node":1
(as you have it in your IT test). Why is this bit needed, if you can shed some light?
@@ -1680,6 +1680,36 @@ public void testQueryOnEmptyDataIndex() { | |||
} | |||
} | |||
|
|||
public void testGroupingStatsOnMissingFields() { | |||
assertAcked(client().admin().indices().prepareCreate("test_missing_fields").setMapping("data", "type=long")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you meant missing_field_index
, right?
var field = as(project.child(), FieldExtractExec.class); | ||
var fields = field.attributesToExtract(); | ||
assertThat(Expressions.names(fields), contains("_meta_field", "gender", "hire_date", "job", "job.raw", "languages", "long_noidx")); | ||
var eval = as(field.child(), EvalExec.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also, please, change the "Expects" part of this method at the top? It's a thing with the optimizer tests that, at least for me, is very useful when looking at a test.
Nice find of the cause!
I think skipping So IMO we should at a minimum apply 1. Then, the |
Hey, nice find @dnhatn !
I believe this is a good quick fix.
I wouldn't disable this locally altogether. There are very valid reasons to combine projections on the local node; for one, I think the crux is that on a local node Thus, I think a more robust approach would be to mark such I know this requires a new transport version, but we could add a |
I think
|
It adds a logical projection into the fragment here. Currently, this top projection gets normally combined with anything that rules like
Oh, right - I thought this wasn't an option because there may be other aggs in the local plan, but of course there can be at most one aggregate 🤦 . So, a local flavor of |
Laterally, I was also wondering if we couldn't do a verification of the outputs before and after local replanning. |
The local plan optimizer should not change the layout, as it has already been agreed upon. However, CombineProjections can violate this when some grouping elements refer to the same attribute. This occurs when ReplaceFieldWithConstantOrNull replaces missing fields with the same reference for a given data type.
I see several ways to address this issue:
Skip CombineProjections in the local plan. This optimization is useful in the global plan, but likely unnecessary in the local plan, especially since we now need to retain the same number of aggregates and groupings.
Update RuleUtils#aliasedNulls to generate an eval for each missing field separately, rather than one per data type.
This change adopts option 2 as it is simpler and more robust. However, either option is fine, and I am open to other suggestions.
Closes #128054