-
Notifications
You must be signed in to change notification settings - Fork 3.4k
branch-3.0: [fix](orc) Should not pass selection vector when decode child column of List or Map #50136 #50317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
branch-3.0: [fix](orc) Should not pass selection vector when decode child column of List or Map #50136 #50317
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TPC-H: Total hot run time: 40384 ms
|
TPC-DS: Total hot run time: 197469 ms
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
ClickBench: Total hot run time: 33.44 s
|
…of List or Map (apache#50136) Related PR: apache#18615 Problem Summary: The problem is like apache/doris-thirdparty#256 When performing late materialization for LIST or MAP types, filters should not be applied directly to their child fields. These complex types rely on offsets to correctly map parent-child relationships within the columnar storage layout (e.g., in ORC or Parquet files). If filters are applied to the children of a LIST or MAP field, it may cause inconsistencies in the offset alignment, leading to incorrect data being read—such as mismatched elements, missing values, or even runtime errors. This breaks the structural integrity of the nested data and can produce incorrect query results. ```text mysql> select * from complex_data_orc; +------+--------------------------+-----------------+ | id | m | l | +------+--------------------------+-----------------+ | 1 | {"a":1, "b":2} | ["a", "b"] | | 2 | {"b":3, "c":4} | ["b"] | | 3 | {"c":5, "a":6, "b":7} | ["c", "a"] | | 4 | {"a":8, "c":9} | ["b", "c"] | | 5 | {"b":10, "a":11} | ["a"] | | 6 | {"c":12, "b":13} | ["c"] | | 7 | {"a":15} | ["a", "a"] | | 8 | {"b":17} | ["b", "b"] | | 9 | {"c":19} | ["c", "c"] | | 10 | {"a":20, "b":21, "c":22} | ["a", "b", "c"] | +------+--------------------------+-----------------+ 10 rows in set (0.02 sec) !!!WRONG RESULT: mysql> select * from complex_data_orc where id > 2; +------+--------------------------+----------------+ | id | m | l | +------+--------------------------+----------------+ | 3 | {"c":5, "a":6, "b":7} | ["c", "a"] | | 4 | {"a":8, "c":9} | ["b", "c"] | | 5 | {"b":10, "":11} | ["a"] | | 6 | {"":12, "":13} | ["c"] | | 7 | {"":15} | ["a", ""] | | 8 | {"":17} | ["", ""] | | 9 | {"":19} | ["", ""] | | 10 | {"a":20, "b":21, "c":22} | ["", "b", "c"] | +------+--------------------------+----------------+ 8 rows in set (0.02 sec) ``` To ensure correctness, filters should only be applied at the top level of the LIST or MAP, and their children should be read in full when late materialization occurs. After this pr: ```text mysql> select * from complex_data_orc where id > 2; +------+--------------------------+-----------------+ | id | m | l | +------+--------------------------+-----------------+ | 3 | {"c":5, "a":6, "b":7} | ["c", "a"] | | 4 | {"a":8, "c":9} | ["b", "c"] | | 5 | {"b":10, "a":11} | ["a"] | | 6 | {"c":12, "b":13} | ["c"] | | 7 | {"a":15} | ["a", "a"] | | 8 | {"b":17} | ["b", "b"] | | 9 | {"c":19} | ["c", "c"] | | 10 | {"a":20, "b":21, "c":22} | ["a", "b", "c"] | +------+--------------------------+-----------------+ 8 rows in set (1.41 sec) ```
f0b9807
to
2d9f03e
Compare
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bp: #50136