0

The xarray.Dataset.groupby method prepares a dataset for iteration over subsets determined by the group argument. If the group argument is a constant value (i.e. one "subset" that's the whole dataset) broadcast to an xarray.DataArray with matching coordinates, then I expect the first group returned to be identical to the original dataset. And that's what happens when the dataset coordinates are dimension coordinates. It doesn't happen for this dataset

ds_before = xr.Dataset(coords={'x': ('z', [0]), 'y': ('y', [1, 2])})

which prints as

Dimensions:  (z: 1, y: 2)
Coordinates:
    x        (z) int64 0
  * y        (y) int64 1 2
Dimensions without coordinates: z
Data variables:
    *empty*

Coordinate x is a non-dimension coordinate. If you switched z to x above, then ds_after produced by the following would be identical to ds_before.

da = xr.DataArray(True, coords=ds_before.coords)
key, value = next(iter(ds.groupby(da)))
ds_after = value.unstack()

The printed representation of ds_after is not identical to ds_before.

Dimensions:  (z: 1, y: 2)
Coordinates:
  * z        (z) int64 0
  * y        (y) int64 1 2
    x        (z, y) int64 0 0
Data variables:
    *empty*

I can work around the new dimension coordinate z, but I don't understand why x has been broadcast along the y dimension. Can you suggest a method for getting x back to its original value?

Ian
  • 1,062
  • 1
  • 9
  • 21
  • You’re grouping a dataset in two dimensions (y and z). Any variables will therefore be broadcast against the two dimensions. If you only grouped on y this wouldn’t be the case. So this is expected and there’s no way to set up a multidimensional groupby where this doesn’t happen. – Michael Delgado Dec 24 '22 at 06:48

1 Answers1

0

The comment by @MichaelDelgado on the question appears correct in part, but there is a way to combine groupby with sel to achieve the desired result.

>>> ds_before = xr.Dataset(
        data_vars={'a': ('y', [0.1, 2.3])},
        coords={'x': ('z', [0]), 'y': ('y', [1, 2])}
        )
>>> da = xr.DataArray(True, coords=ds_before.coords)
>>> idx = next((group.unstack().indexes for label, group in da.groupby(da)))
>>> ds_after = ds_before.sel(idx)
>>> print(ds_after)
Dimensions:  (y: 2, z: 1)
Coordinates:
    x        (z) int64 0
  * y        (y) int64 1 2
Dimensions without coordinates: z
Data variables:
    a        (y) float64 0.1 2.3

This is not round tripping a dataset through its own group by method, but it does preserve the original dimensions on a and x. Using this method with a da that is not constant (so there are multiple groups) also preserves the original dimensions.

Ian
  • 1,062
  • 1
  • 9
  • 21