0

Lets say I have the simple array:

data = [1,1,0,0,1,1,1]

I can apply labeling to this data with the scipy ndimage module with:

groups, _ = sp.ndimage.label(data)

Resulting in

In [68]: print(groups)
[1 1 0 0 2 2 2]

Now, I would like to do the same labeling function on a xarray DataArray.

xr_data = xr.DataArray([1,1,0,0,1,1,1], coords = [("x", [0,1,2,3,4,5,6])])

I know I could call the same function as before on the xr_data, but the output of doing this call is a numpy array, which in my actual dataset, is too large to fit in memory.

It seems like the xr.apply_ufunc function is what I need. However, I am having trouble getting it to work.

def xr_label(arr):
    return xr.apply_ufunc(sp.ndimage.label, arr)

xr_groups, _ = xr_label(xr_data)

This results in: "ValueError: applied function returned data with unexpected number of dimensions. Received 0 dimension(s) but expected 1 dimensions with names: ('x',)"

I'm finding the documentation on the apply_ufunc method difficult to interpret. Can someone help me out with this?

hm8
  • 1,381
  • 3
  • 21
  • 41

1 Answers1

2

You have to define input_core_dims and output_core_dims as parameters to apply_ufunc. See the documentation at: http://xarray.pydata.org/en/stable/generated/xarray.apply_ufunc.html

In your case I think this will be:

xr.apply_ufunc(sp.ndimage.label, arr, input_core_dims=[['x']], output_core_dims=[['x']])

I also recently struggled with understanding apply_ufunc (to be fair, I still don't have a full understanding), however the example at http://xarray.pydata.org/en/stable/examples/apply_ufunc_vectorize_1d.html helped me a lot.

Peter
  • 746
  • 6
  • 22
  • 1
    Thanks! The only modification I had to make was adding another empty list to the output dimensions: `output_core_dims=[['x'], []]`, because the label function also returns two things - a array with each element labels, and a single integer giving the number of groups found. – hm8 Dec 07 '20 at 16:34