Skip to content

ESQL: Fix some more usages of field attribute names in LOOKUP JOIN #129355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jun 17, 2025

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Jun 12, 2025

  • Add a test that demonstrates that LOOKUP JOIN works with union types.
  • Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
  • Refactor LookupJoinTypesIT

This is not a bug fix - it's future-proofing.

LOOKUP JOIN with union types already works fine due to the fact that the union type field attributes never make it into the Join node. E.g. for FROM indices* | EVAL field = field::integer with a multi-typed field we create a field attribute called $$field$$converted_to$integer - but if we add the command LOOKUP JOIN lookup_index ON field, field will refer to the reference attribute field created in the EVAL.

However, our implementation of LOOKUP JOIN still relies on FieldAttribute#name rather than FieldAttribute#fieldName when telling the compute engine which physical index fields to use. The latter is always correct, the former is wrong for union types. To be able to propagate union type field attributes into the join, we need to use FieldAttribute#fieldName - and that's what this PR does.

This is relevant for when LOOKUP JOIN gets generalized and starts using proper expressions, like LOOKUP JOIN lookup_index ON match_field::keyword == other_field.

Also, it's beneficial not to rely on FieldAttribute#name when telling the compute engine which fields to fetch from the lookup index. It means we can handily rename the field attributes while still referring to the same physical fields.

In LOOKUP JOIN, union type fields don't work as match fields because we
extract their names wrong. Correct this and make MatchConfig rely on
FieldName to avoid this problem going forward.
@alex-spies alex-spies added >refactoring auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.1.0 v9.0.3 v8.18.3 labels Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 12, 2025
private MatchConfig(FieldAttribute match, Layout.ChannelAndType input) {
// Note, this handles TEXT fields with KEYWORD subfields
this(match.exactAttribute().name(), input.channel(), input.type());
this(match.exactAttribute().fieldName(), input.channel(), input.type());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not quite obvious to me how the keyword subfield of a text field works with multi-typed field, it will be great if we have a test to exercise this code path, or some comments for when match.exactAttribute().name() is different from match.exactAttribute().fieldName().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think this line is super redundant. The matchfield is a field from the lookup index. We can't even use TEXT as a lookup key in the lookup index, it's only allowed in the main index. So the dropping down to the exact subfield just doesn't ever apply.

I'll expand the comment. Testing this code path is currently not really possible (at least not in a realistic scenario.)

In any case, using fieldName over name is going to be more correct once the call to exactAttribute stops being redundant.

# union type behavior
###############################################

joinOnMultiTypedMatchFieldCastToInteger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests for multi-typed numeric fields as join keys. We still have gaps in test coverage for multi-typed join keys involving string types (keyword, text, ip) and date types (millis, nanos). If these are not within the scope of this PR, they should be covered in a separate one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a more complete list of tests to LookupJoinTypesIT. It's still not exhaustive but IMO covers enough ground for now.

That required a bit of refactoring ^^

Comment on lines +148 to +158
// Union types; non-exhaustive and can be extended
{
TestConfigs configs = testConfigurations.computeIfAbsent("union-types", TestConfigs::new);
configs.addUnionTypePasses(SHORT, INTEGER, INTEGER);
configs.addUnionTypePasses(BYTE, DOUBLE, LONG);
configs.addUnionTypePasses(DATETIME, DATE_NANOS, DATE_NANOS);
configs.addUnionTypePasses(DATE_NANOS, DATETIME, DATETIME);
configs.addUnionTypePasses(SCALED_FLOAT, HALF_FLOAT, DOUBLE);
configs.addUnionTypePasses(TEXT, KEYWORD, KEYWORD);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union-type test cases added here!

@@ -270,16 +241,16 @@ protected Collection<Class<? extends Plugin>> nodePlugins() {
Set<String> knownTypes = new HashSet<>();
for (TestConfigs configs : testConfigurations.values()) {
for (TestConfig config : configs.configs.values()) {
if (knownTypes.contains(config.indexName())) {
throw new IllegalArgumentException("Duplicate index name: " + config.indexName());
if (knownTypes.contains(config.lookupIndexName())) {
Copy link
Member

@fang-xing-esql fang-xing-esql Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we only enforce uniqueness for lookupIndexName. If this restriction is intended to apply only to lookup indices, it might be helpful to update the error message to make that explicit—for example:
"Duplicate lookup index name:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the message!

I think long term, we should refactor the test setup as it creates too many indices. We only really need 1 main index (with a field for every relevant type), 1 lookup index (again, 1 field for every relevant type) and only additional indices for union type cases (because we really need mapping conflicts there).

That would simplify the index creation, validation etc. I initially wanted to do this change in this PR, but I already introduced a lot of changes so that's better to do another time IMHO.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more tests around union-typed fields, @alex-spies! I’ve added a few comments in LookupJoinTypesIT, mostly trying to improve readability.

@alex-spies
Copy link
Contributor Author

alex-spies commented Jun 16, 2025

I included @dnhatn 's fix from #129437 since we're touching the same code and will immediately create conflicts for one another.

Nevermind, Nhat got his fix in before me - but I'll deal with the merge conflicts :)

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @alex-spies !

@alex-spies alex-spies merged commit 65cbccc into elastic:main Jun 17, 2025
24 checks passed
@alex-spies alex-spies deleted the fix-lu-join-with-union-types branch June 17, 2025 08:30
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129355

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jun 17, 2025
…lastic#129355)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jun 17, 2025
…lastic#129355)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.0
8.19
8.18

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jun 17, 2025
…lastic#129355)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
elasticsearchmachine pushed a commit that referenced this pull request Jun 17, 2025
…129355) (#129577)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
elasticsearchmachine pushed a commit that referenced this pull request Jun 17, 2025
…129355) (#129579)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
elasticsearchmachine pushed a commit that referenced this pull request Jun 18, 2025
…129355) (#129578)

* Add a test that demonstrates that LOOKUP JOIN works with union types.
* Fix usage of FieldAttribute#name where we should use FieldAttribute#fieldName in LOOKUP JOIN.
* Refactor LookupJoinTypesIT

This doesn't fix any bugs per se, but will avoid bwc problems once we propagate union type field attributes into Join LogicalPlan nodes.

(cherry picked from commit 65cbccc)

# Conflicts:
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupJoinTypesIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.3 v8.19.0 v9.0.3 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants