-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
The NISAR test currently fails because it has an attribute value of inf
(the float) which leads to ValueError: Out of range float values are not JSON compliant: inf
when trying to write to either Icechunk or Kerchunk. I wonder how we should handle cases on non-JSON serializable attributes with Zarr V3? Some options:
- Add a parameter to
to_icechunk
andto_kerchunk
that provides the user the option to raise an error, drop the attribute, or cast to a string - Catch the upstream error an raise a more informative error about which variable / attribute is causing the issue
- Defer to parsers and provide documentation about the requirement for objects to be JSON serializable
Relevant Zarr spec discussion: zarr-developers/zarr-specs#351
It's slow to debug over the network, so a recommended approach for an MVCE is to download https://nisar.asf.earthdatacloud.nasa.gov/NISAR-SAMPLE-DATA/GCOV/ALOS1_Rosamond_20081012/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5 and reproduce locally:
import xarray as xr
from obstore.store import LocalStore
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from icechunk import Repository, Storage
# create an in-memory icechunk store
storage = Storage.new_in_memory()
repo = Repository.create(storage=storage)
session = repo.writable_session("main")
url = "file:///Users/max/Documents/Code/zarr-developers/VirtualiZarr/.vscode/data/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5"
hdf_group = "science/LSAR/GCOV/grids/frequencyA"
store = LocalStore()
registry = ObjectStoreRegistry()
registry.register("file://", store)
drop_variables = ["listOfCovarianceTerms", "listOfPolarizations"]
parser = HDFParser(group=hdf_group, drop_variables=drop_variables)
with (
xr.open_dataset(
url,
engine="h5netcdf",
group=hdf_group,
drop_variables=drop_variables,
phony_dims="access",
) as dsXR,
open_virtual_dataset(
url=url,
registry=registry,
parser=parser,
) as vds,
):
vds.vs.to_icechunk(session.store)
with xr.open_zarr(session.store, zarr_format=3, consolidated=False) as dsV:
xr.testing.assert_equal(dsXR, dsV)
Metadata
Metadata
Assignees
Labels
No labels