Skip to content

Required fields within optional fields cause incorrect results in Trino #13328

Open
@dejangvozdenac

Description

@dejangvozdenac

Apache Iceberg version

1.9.1 (latest release)

Query engine

Trino

Please describe the bug 🐞

In Spark, we create a nested struct address.street. The outermost field address is optional, but the innermost field street is required. When querying with trino with condition address.street is null with projection pushdown disabled, trino reads the entire file and returns those fields where address is null (and thus address.street is null). However, when using projection pushdown, Trino delegates the planning decision to Iceberg and it seems to get no eligible files to read, leading to no rows returned.

I can't find anything in the docs that says what's the right behavior here (as in, does address.street is null mean that address exists and address.street is null or that address.street is not set in that row in any way), but agreement between iceberg and Trino is essential.

Here is the spark-sql commands that I used to create the table

spark-sql>  CREATE TABLE default.dejan_test (
  id INT NOT NULL,
  name STRING NOT NULL,
  age INT NOT NULL,
  address STRUCT<street: STRING NOT NULL, address_info: STRUCT<city: STRING NOT NULL, county: STRING NOT NULL, state: STRING NOT NULL>>)
USING iceberg;
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
  0, 
  'Jane Doe', 
  27, 
  NULL
);
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
  1, 
  'John Doe', 
  30, 
  STRUCT(
    '123 Main St',
    STRUCT('San Francisco', 'San Francisco County', 'California')
  )
);

Here are the two different results we get from Trino:

trino> 
set session iceberg.projection_pushdown_enabled=false;
SET SESSION
trino> 
select
  id
from
  iceberg.default.dejan_test
where
  address.street is null;
 id 
----
  0 
(1 row)

Query 20250613_033713_00001_xn59q, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
2.85 [2 rows, 4.43KiB] [0 rows/s, 1.56KiB/s]


trino> 
set session iceberg.projection_pushdown_enabled=true;
SET SESSION
trino> 
select
  id
from
  iceberg.default.dejan_test
where
  address.street is null;
 id 
----
(0 rows)

Query 20250613_034027_00008_xn59q, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.36 [0 rows, 0B] [0 rows/s, 0B/s]

Full trino issue and reproduction steps can be found here: trinodb/trino#20511 (comment)

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions