Skip to content

Row type statistics missing in Delta Lake #25566

Open
@chenjian2664

Description

@chenjian2664

Currently, the Delta Lake connector not write column statistics for row type.

Here is minimal reproduce step in Delta Lake connector

trino:tpch> CREATE TABLE t (id int, v1 int, r row(n1 int, n2 row(n3 int, n4 int)));
CREATE TABLE
trino:tpch> INSERT INTO t values (1, 1, row(1, row(1,1)));
INSERT: 1 row

Query 20250414_080903_00003_2ffn7, FINISHED, 2 nodes
Splits: 15 total, 15 done (100.00%)
2.15 [0 rows, 0B] [0 rows/s, 0B/s]

trino:tpch> select * from t;
 id | v1 |            r            
----+----+-------------------------
  1 |  1 | {n1=1, n2={n3=1, n4=1}} 
(1 row)

Query 20250414_080909_00004_2ffn7, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.85 [1 rows, 691B] [1 rows/s, 810B/s]

trino:tpch> show stats for (select * from t);
 column_name | data_size | distinct_values_count | nulls_fraction | row_count | low_value | high_value 
-------------+-----------+-----------------------+----------------+-----------+-----------+------------
 id          |      NULL |                   1.0 |            0.0 |      NULL | 1         | 1          
 v1          |      NULL |                   1.0 |            0.0 |      NULL | 1         | 1          
 r           |      NULL |                  NULL |           NULL |      NULL | NULL      | NULL       
 NULL        |      NULL |                  NULL |           NULL |       1.0 | NULL      | NULL       
(4 rows)

in log file:

"stats":"{\"numRecords\":1,\"minValues\":{\"id\":1,\"v1\":1},\"maxValues\":{\"id\":1,\"v1\":1},\"nullCount\":{\"id\":0,\"v1\":0}}"

Metadata

Metadata

Assignees

No one assigned

    Labels

    delta-lakeDelta Lake connector

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions