Skip to content

How to handle non-JSON serializable attributes? #715

@maxrjones

Description

@maxrjones

The NISAR test currently fails because it has an attribute value of inf (the float) which leads to ValueError: Out of range float values are not JSON compliant: inf when trying to write to either Icechunk or Kerchunk. I wonder how we should handle cases on non-JSON serializable attributes with Zarr V3? Some options:

  • Add a parameter to to_icechunk and to_kerchunk that provides the user the option to raise an error, drop the attribute, or cast to a string
  • Catch the upstream error an raise a more informative error about which variable / attribute is causing the issue
  • Defer to parsers and provide documentation about the requirement for objects to be JSON serializable

Relevant Zarr spec discussion: zarr-developers/zarr-specs#351

It's slow to debug over the network, so a recommended approach for an MVCE is to download https://nisar.asf.earthdatacloud.nasa.gov/NISAR-SAMPLE-DATA/GCOV/ALOS1_Rosamond_20081012/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5 and reproduce locally:

import xarray as xr
from obstore.store import LocalStore

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from icechunk import Repository, Storage

# create an in-memory icechunk store
storage = Storage.new_in_memory()
repo = Repository.create(storage=storage)
session = repo.writable_session("main")

url = "file:///Users/max/Documents/Code/zarr-developers/VirtualiZarr/.vscode/data/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5"
hdf_group = "science/LSAR/GCOV/grids/frequencyA"
store = LocalStore()
registry = ObjectStoreRegistry()
registry.register("file://", store)
drop_variables = ["listOfCovarianceTerms", "listOfPolarizations"]
parser = HDFParser(group=hdf_group, drop_variables=drop_variables)
with (
    xr.open_dataset(
        url,
        engine="h5netcdf",
        group=hdf_group,
        drop_variables=drop_variables,
        phony_dims="access",
    ) as dsXR,
    open_virtual_dataset(
        url=url,
        registry=registry,
        parser=parser,
    ) as vds,
):
    vds.vs.to_icechunk(session.store)

    with xr.open_zarr(session.store, zarr_format=3, consolidated=False) as dsV:    
        xr.testing.assert_equal(dsXR, dsV)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions