Description
Apache Iceberg version
1.9.1 (latest release)
Query engine
Trino
Please describe the bug 🐞
In Spark, we create a nested struct address.street
. The outermost field address
is optional, but the innermost field street
is required. When querying with trino with condition address.street is null
with projection pushdown disabled, trino reads the entire file and returns those fields where address is null (and thus address.street
is null). However, when using projection pushdown, Trino delegates the planning decision to Iceberg and it seems to get no eligible files to read, leading to no rows returned.
I can't find anything in the docs that says what's the right behavior here (as in, does address.street is null
mean that address
exists and address.street
is null or that address.street
is not set in that row in any way), but agreement between iceberg and Trino is essential.
Here is the spark-sql commands that I used to create the table
spark-sql> CREATE TABLE default.dejan_test (
id INT NOT NULL,
name STRING NOT NULL,
age INT NOT NULL,
address STRUCT<street: STRING NOT NULL, address_info: STRUCT<city: STRING NOT NULL, county: STRING NOT NULL, state: STRING NOT NULL>>)
USING iceberg;
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
0,
'Jane Doe',
27,
NULL
);
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
1,
'John Doe',
30,
STRUCT(
'123 Main St',
STRUCT('San Francisco', 'San Francisco County', 'California')
)
);
Here are the two different results we get from Trino:
trino>
set session iceberg.projection_pushdown_enabled=false;
SET SESSION
trino>
select
id
from
iceberg.default.dejan_test
where
address.street is null;
id
----
0
(1 row)
Query 20250613_033713_00001_xn59q, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
2.85 [2 rows, 4.43KiB] [0 rows/s, 1.56KiB/s]
trino>
set session iceberg.projection_pushdown_enabled=true;
SET SESSION
trino>
select
id
from
iceberg.default.dejan_test
where
address.street is null;
id
----
(0 rows)
Query 20250613_034027_00008_xn59q, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.36 [0 rows, 0B] [0 rows/s, 0B/s]
Full trino issue and reproduction steps can be found here: trinodb/trino#20511 (comment)
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time