Skip to content

Row type statistics missing in Delta Lake #25566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chenjian2664 opened this issue Apr 14, 2025 · 3 comments
Open

Row type statistics missing in Delta Lake #25566

chenjian2664 opened this issue Apr 14, 2025 · 3 comments
Labels
delta-lake Delta Lake connector

Comments

@chenjian2664
Copy link
Contributor

Currently, the Delta Lake connector not write column statistics for row type.

Here is minimal reproduce step in Delta Lake connector

trino:tpch> CREATE TABLE t (id int, v1 int, r row(n1 int, n2 row(n3 int, n4 int)));
CREATE TABLE
trino:tpch> INSERT INTO t values (1, 1, row(1, row(1,1)));
INSERT: 1 row

Query 20250414_080903_00003_2ffn7, FINISHED, 2 nodes
Splits: 15 total, 15 done (100.00%)
2.15 [0 rows, 0B] [0 rows/s, 0B/s]

trino:tpch> select * from t;
 id | v1 |            r            
----+----+-------------------------
  1 |  1 | {n1=1, n2={n3=1, n4=1}} 
(1 row)

Query 20250414_080909_00004_2ffn7, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.85 [1 rows, 691B] [1 rows/s, 810B/s]

trino:tpch> show stats for (select * from t);
 column_name | data_size | distinct_values_count | nulls_fraction | row_count | low_value | high_value 
-------------+-----------+-----------------------+----------------+-----------+-----------+------------
 id          |      NULL |                   1.0 |            0.0 |      NULL | 1         | 1          
 v1          |      NULL |                   1.0 |            0.0 |      NULL | 1         | 1          
 r           |      NULL |                  NULL |           NULL |      NULL | NULL      | NULL       
 NULL        |      NULL |                  NULL |           NULL |       1.0 | NULL      | NULL       
(4 rows)

in log file:

"stats":"{\"numRecords\":1,\"minValues\":{\"id\":1,\"v1\":1},\"maxValues\":{\"id\":1,\"v1\":1},\"nullCount\":{\"id\":0,\"v1\":0}}"
@chenjian2664 chenjian2664 added the delta-lake Delta Lake connector label Apr 14, 2025
@chenjian2664
Copy link
Contributor Author

continue; // Only base column stats are supported

@ebyhr @findinpath Any particular reason?

@findinpath
Copy link
Contributor

@krvikash do you remember the reasoning here?

@krvikash
Copy link
Contributor

Looks like the change was from the beginning of the delta connector added to trino. Not sure about the reason.

9956543#diff-f4c2dad50fb91b940e14a1daac01fc71f906e5e7baa1e4c7a914b28488703997R191

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-lake Delta Lake connector
Development

No branches or pull requests

3 participants