Skip to content

Regression in xarray + icechunk test - now opening data via dask #688

@TomNicholas

Description

@TomNicholas

As of xarray 2025.7.0 some of our icechunk roundtrip tests fail, because they open data as dask arrays instead of lazy xarray arrays. The simplest failing test is virtualizarr/tests/test_writers/test_icechunk.py::test_set_single_virtual_ref_with_encoding.

I'm having trouble simplifying the issue. It seems to require all of these factors to be involved:

  • Some decoding to happen (the similar test_icechunk.py::test_set_single_virtual_ref_without_encoding test passes)
  • Icechunk to be involved (replacing Icechunk with a zarr MemoryStore makes it pass)
  • Virtual references to be involved (rewriting the test to write native chunks instead of virtual chunks makes it pass)

For the moment we have pinned our xarray dependency to avoid this (in #673), but we do need to fix it.

My suspicion is that something in xarray has effectively changed the default behaviour of the chunks kwarg to open_dataset, but it somehow only triggers in a very specific situation involving encoding.

My next step is going to be git bisect-ing xarray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Icechunk 🧊Relates to Icechunk library / specbugSomething isn't workingupstream issuexarrayRequires changes to xarray upstream

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions