Iceberg writes to match file size #3815

gero90 · 2025-02-14T22:40:19Z

gero90
Feb 14, 2025

If there is anyway to estimate parquet file size in df.write_iceberg() , it would be really nice to try to get parquet files of size close to the iceberg table property write.target-file-size-bytes (default is 512 MiB)

Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.

As example, I'm doing df.into_partitions(1) right before df.write_iceberg() where I know the total data is small, to get a single file per write.

Thanks in advance for taking a look and for making daft awesome!

kevinzwang · 2025-02-19T00:37:13Z

kevinzwang
Feb 19, 2025
Maintainer

Hey @gero90, thanks for bringing this up! Created an issue to track this feature request's progress: #3823
Putting it in the backlog for now, but we are working on adding more functionality to our writes, especially for table formats, and I hope we can get to this soon!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iceberg writes to match file size #3815

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Iceberg writes to match file size #3815

Uh oh!

gero90 Feb 14, 2025

Replies: 1 comment

Uh oh!

kevinzwang Feb 19, 2025 Maintainer

gero90
Feb 14, 2025

kevinzwang
Feb 19, 2025
Maintainer