I am desperately trying to split strings within an xarray.Dataarray
.
What should happen to every element of the array is e.g.
"aaabbbccc" --> [aaa, bbb, ccc]
Fortunately, such a function already exists in the textwrap library, but applying it to my Dataarray
is a different story:
xds = riox.open_rasterio(fp_output_tmp_mlsieved, chunks = "auto")
<xarray.DataArray (band: 1, y: 2, x: 2)>
dask.array<transpose, shape=(1, 2, 2), dtype=<U18, chunksize=(1, 2, 2), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* x (x) float64 3.077e+06 3.077e+06 ... 3.077e+06 3.077e+06
* y (y) float64 1.865e+06 1.865e+06 ... 1.865e+06 1.865e+06
spatial_ref int64 0
Loaded it looks like this:
array([[['000000000000000000', '000000000000000000'],
['000000000000000000', '000000000000000000']]], dtype='<U18')
I think a solution is to apply it with xr.apply_ufunc()
. I have managed to do that with a simpler numpy function before, but with wrap()
all I get is a bunch of errors. I think the main issue is that it is not a vectorized numpy function and second that I can´t get the dimensions to work out. My latest try looks like that:
def decompressor(s, l):
return np.array(wrap(s.item(), l))
def ufunc_decompressor(s, l):
return xr.apply_ufunc(
decompressor,
s, l,
output_dtypes=[np.dtype(f"U{l}")],
input_core_dims=[["band"],[]],
output_core_dims=[["band"]],
exclude_dims=set(("band",)),
dask="parallelized",
vectorize=True
)
xds_split = ufunc_decompressor(xds, 3).load()
What I get is a cryptic error:
File "/home/.../miniconda3/envs/postproc/lib/python3.10/site-packages/dask/array/gufunc.py", line 489, in <genexpr>
core_output_shape = tuple(core_shapes[d] for d in ocd)
KeyError: 'dim0'