Skip to content

branch-2.1: [fix](orc) Should not pass selection vector when decode child column of List or Map #50136 #50316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

suxiaogang223
Copy link
Contributor

bp: #50136

@suxiaogang223 suxiaogang223 requested a review from yiguolei as a code owner April 23, 2025 02:45
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 38.81% (10255/26422)
Line Coverage 29.86% (85129/285070)
Region Coverage 28.52% (43876/153862)
Branch Coverage 25.25% (22434/88860)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage
Line Coverage
Region Coverage
Branch Coverage

@suxiaogang223 suxiaogang223 force-pushed the pick_fix_orc_lazy_read_2.1 branch from d67913f to 2d96368 Compare April 24, 2025 02:15
…of List or Map (apache#50136)

Related PR: apache#18615

Problem Summary:
The problem is like apache/doris-thirdparty#256
When performing late materialization for LIST or MAP types, filters
should not be applied directly to their child fields. These complex
types rely on offsets to correctly map parent-child relationships within
the columnar storage layout (e.g., in ORC or Parquet files).

If filters are applied to the children of a LIST or MAP field, it may
cause inconsistencies in the offset alignment, leading to incorrect data
being read—such as mismatched elements, missing values, or even runtime
errors. This breaks the structural integrity of the nested data and can
produce incorrect query results.

```text
mysql> select * from complex_data_orc;
+------+--------------------------+-----------------+
| id   | m                        | l               |
+------+--------------------------+-----------------+
|    1 | {"a":1, "b":2}           | ["a", "b"]      |
|    2 | {"b":3, "c":4}           | ["b"]           |
|    3 | {"c":5, "a":6, "b":7}    | ["c", "a"]      |
|    4 | {"a":8, "c":9}           | ["b", "c"]      |
|    5 | {"b":10, "a":11}         | ["a"]           |
|    6 | {"c":12, "b":13}         | ["c"]           |
|    7 | {"a":15}                 | ["a", "a"]      |
|    8 | {"b":17}                 | ["b", "b"]      |
|    9 | {"c":19}                 | ["c", "c"]      |
|   10 | {"a":20, "b":21, "c":22} | ["a", "b", "c"] |
+------+--------------------------+-----------------+
10 rows in set (0.02 sec)

!!!WRONG RESULT:
mysql> select * from complex_data_orc where id > 2;
+------+--------------------------+----------------+
| id   | m                        | l              |
+------+--------------------------+----------------+
|    3 | {"c":5, "a":6, "b":7}    | ["c", "a"]     |
|    4 | {"a":8, "c":9}           | ["b", "c"]     |
|    5 | {"b":10, "":11}          | ["a"]          |
|    6 | {"":12, "":13}           | ["c"]          |
|    7 | {"":15}                  | ["a", ""]      |
|    8 | {"":17}                  | ["", ""]       |
|    9 | {"":19}                  | ["", ""]       |
|   10 | {"a":20, "b":21, "c":22} | ["", "b", "c"] |
+------+--------------------------+----------------+
8 rows in set (0.02 sec)
```

To ensure correctness, filters should only be applied at the top level
of the LIST or MAP, and their children should be read in full when late
materialization occurs.

After this pr:
```text
mysql> select * from complex_data_orc where id > 2;
+------+--------------------------+-----------------+
| id   | m                        | l               |
+------+--------------------------+-----------------+
|    3 | {"c":5, "a":6, "b":7}    | ["c", "a"]      |
|    4 | {"a":8, "c":9}           | ["b", "c"]      |
|    5 | {"b":10, "a":11}         | ["a"]           |
|    6 | {"c":12, "b":13}         | ["c"]           |
|    7 | {"a":15}                 | ["a", "a"]      |
|    8 | {"b":17}                 | ["b", "b"]      |
|    9 | {"c":19}                 | ["c", "c"]      |
|   10 | {"a":20, "b":21, "c":22} | ["a", "b", "c"] |
+------+--------------------------+-----------------+
8 rows in set (1.41 sec)
```
@suxiaogang223 suxiaogang223 force-pushed the pick_fix_orc_lazy_read_2.1 branch from 2d96368 to 5b87c30 Compare April 24, 2025 02:16
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 38.84% (10269/26438)
Line Coverage 29.90% (85302/285286)
Region Coverage 28.55% (43966/153976)
Branch Coverage 25.27% (22478/88938)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage
Line Coverage
Region Coverage
Branch Coverage

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage
Line Coverage
Region Coverage
Branch Coverage

@yiguolei yiguolei merged commit 0710d9b into apache:branch-2.1 Apr 25, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants