Skip to content

Trying to append to icechunk fails #740

@jacobbieker

Description

@jacobbieker

Hi,

I've been working trying to virtualize GOES data (see #728) and have hit another snag. Not sure if this is an Icechunk bug, or Virtualizarr, so happy to move it to the Icechunk repo if desired.

Using the current main branch of Virtualizarr,qnd icechunk 1.0.3 running the following code results in an error when appending the second virtual dataset to the Icechunk store. This happens with other files too. On the other hand, when I use open_virtual_mfdataset on these files and write it to an icechunk store at once, it works, but not when doing the files one after another.

import icechunk
from icechunk import local_filesystem_storage
from virtualizarr.xarray import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obstore.store import from_url

path1 = "s3://noaa-goes19/ABI-L2-MCMIPC/2025/001/00/OR_ABI-L2-MCMIPC-M6_G19_s20250010001173_e20250010003557_c20250010004071.nc"
path2 = "s3://noaa-goes19/ABI-L2-MCMIPC/2025/001/00/OR_ABI-L2-MCMIPC-M6_G19_s20250010006173_e20250010008546_c20250010009059.nc"
parser = HDFParser()
bucket = "s3://noaa-goes19"
store = from_url(bucket, skip_signature=True)
registry = ObjectStoreRegistry({bucket: store})

storage = local_filesystem_storage("vds_test.icechunk")
repo = icechunk.Repository.open_or_create(storage)
session = repo.writable_session("main")
vds = open_virtual_dataset(path1, parser=parser, registry=registry, loadable_variables=["t", "x", "y"])
vds = vds.expand_dims("t")
print(vds)
vds.vz.to_icechunk(session.store)
session.commit("Add first file")
vds2 = open_virtual_dataset(path2, parser=parser, registry=registry, loadable_variables=["t", "x", "y"])
vds2 = vds2.expand_dims("t")
print(vds2)
session = repo.writable_session("main")
vds2.vz.to_icechunk(session.store, append_dim="t")
session.commit("Add second file")

This results in the following error

Traceback (most recent call last):
  File "/Users/jacob/Development/planetary-datasets/dags/assets/icechunk_prover.py", line 33, in <module>
    snapshot_id = session.commit("Add second file")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jacob/Development/planetary-datasets/.pixi/envs/default/lib/python3.12/site-packages/icechunk/session.py", line 226, in commit
    return self._session.commit(
           ^^^^^^^^^^^^^^^^^^^^^
icechunk.IcechunkError:   x session error: failed to create manifest from chunk stream
  | 
  | context:
  |    0: icechunk::session::_commit
  |            with Add second file rewrite_manifests=false
  |              at icechunk/src/session.rs:987
  |    1: icechunk::session::commit
  |            with Add second file
  |              at icechunk/src/session.rs:950
  | 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Icechunk 🧊Relates to Icechunk library / specbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions