-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Labels
Icechunk 🧊Relates to Icechunk library / specRelates to Icechunk library / specbugSomething isn't workingSomething isn't working
Description
Hi,
I've been working trying to virtualize GOES data (see #728) and have hit another snag. Not sure if this is an Icechunk bug, or Virtualizarr, so happy to move it to the Icechunk repo if desired.
Using the current main
branch of Virtualizarr,qnd icechunk 1.0.3 running the following code results in an error when appending the second virtual dataset to the Icechunk store. This happens with other files too. On the other hand, when I use open_virtual_mfdataset
on these files and write it to an icechunk store at once, it works, but not when doing the files one after another.
import icechunk
from icechunk import local_filesystem_storage
from virtualizarr.xarray import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obstore.store import from_url
path1 = "s3://noaa-goes19/ABI-L2-MCMIPC/2025/001/00/OR_ABI-L2-MCMIPC-M6_G19_s20250010001173_e20250010003557_c20250010004071.nc"
path2 = "s3://noaa-goes19/ABI-L2-MCMIPC/2025/001/00/OR_ABI-L2-MCMIPC-M6_G19_s20250010006173_e20250010008546_c20250010009059.nc"
parser = HDFParser()
bucket = "s3://noaa-goes19"
store = from_url(bucket, skip_signature=True)
registry = ObjectStoreRegistry({bucket: store})
storage = local_filesystem_storage("vds_test.icechunk")
repo = icechunk.Repository.open_or_create(storage)
session = repo.writable_session("main")
vds = open_virtual_dataset(path1, parser=parser, registry=registry, loadable_variables=["t", "x", "y"])
vds = vds.expand_dims("t")
print(vds)
vds.vz.to_icechunk(session.store)
session.commit("Add first file")
vds2 = open_virtual_dataset(path2, parser=parser, registry=registry, loadable_variables=["t", "x", "y"])
vds2 = vds2.expand_dims("t")
print(vds2)
session = repo.writable_session("main")
vds2.vz.to_icechunk(session.store, append_dim="t")
session.commit("Add second file")
This results in the following error
Traceback (most recent call last):
File "/Users/jacob/Development/planetary-datasets/dags/assets/icechunk_prover.py", line 33, in <module>
snapshot_id = session.commit("Add second file")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jacob/Development/planetary-datasets/.pixi/envs/default/lib/python3.12/site-packages/icechunk/session.py", line 226, in commit
return self._session.commit(
^^^^^^^^^^^^^^^^^^^^^
icechunk.IcechunkError: x session error: failed to create manifest from chunk stream
|
| context:
| 0: icechunk::session::_commit
| with Add second file rewrite_manifests=false
| at icechunk/src/session.rs:987
| 1: icechunk::session::commit
| with Add second file
| at icechunk/src/session.rs:950
|
TomNicholas
Metadata
Metadata
Assignees
Labels
Icechunk 🧊Relates to Icechunk library / specRelates to Icechunk library / specbugSomething isn't workingSomething isn't working