I am trying to organize 3D data collected from several participants with a different number of samples for each participant. Each participant has a unique session and seat index in the experiment. For each participant i, I have a 3D array composed of Ni images (height
*width
).
I first tried by creating a Dataset of participants but I ended up having many NaNs due to the fact that participants have different samples on the same dimension (sample
dim). I then switched to a unique DataArray containing all my participants data concatenated on a single dimension I call depth
. This dimension is then associated to a multiindex coordinate combining session
, seat
and sample
coordinates:
<xarray.DataArray (depth: 52, height: 4, width: 4)>
array([[[0.92337111, 0.86505447, 0.08541727, 0.74850848],
[0.02336959, 0.0495726 , 0.98745956, 0.58831929],
[0.62128185, 0.7732787 , 0.27716268, 0.83634779],
[0.08146719, 0.35851012, 0.44170263, 0.74338872]],
...
[[0.4365896 , 0.23527988, 0.86891853, 0.94486637],
[0.20884748, 0.81012315, 0.61542411, 0.76706922],
[0.33391262, 0.88955315, 0.25329999, 0.35803887],
[0.49586615, 0.94767265, 0.40868892, 0.42393425]]])
Coordinates:
* height (height) int64 0 1 2 3
* width (width) int64 0 1 2 3
* depth (depth) MultiIndex
- session (depth) int64 0 0 0 0 0 0 0 0 0 0 0 1 1 ... 3 3 3 3 3 3 3 3 3 3 3 3
- seat (depth) int64 0 0 0 0 0 1 1 1 1 1 1 0 0 ... 0 0 0 0 0 1 1 1 1 1 1 1
- sample (depth) int64 0 1 2 3 4 0 1 2 3 4 5 0 1 ... 1 2 3 4 5 0 1 2 3 4 5 6
However I find this solution not really usable for several reasons:
- each time I want to perform a
groupby
I have to reset the index to recreate one with the coordinates I want to group since xarray does not support multiple groupby on the same dim:
da = da.reset_index('depth')
da = da.set_index(depth=['session', 'seat'])
da.groupby('depth').mean()
- the result of the code above is not perfect as it does not maintain the multiindex names:
<xarray.DataArray (depth: 8, height: 4, width: 4)>
array([[[0.47795382, 0.67322777, 0.12946181, 0.48983815],
[0.33895882, 0.46772217, 0.62886196, 0.55970122],
[0.57370573, 0.47272117, 0.31529004, 0.63230245],
[0.63230284, 0.5352105 , 0.65805407, 0.65274841]],
...
[[0.55672404, 0.37963945, 0.57334768, 0.64853806],
[0.46608072, 0.39506509, 0.66339553, 0.71447367],
[0.58989461, 0.66066485, 0.53271228, 0.43036214],
[0.44163921, 0.54990042, 0.4229631 , 0.5941268 ]]])
Coordinates:
* height (height) int64 0 1 2 3
* width (width) int64 0 1 2 3
* depth (depth) MultiIndex
- depth_level_0 (depth) int64 0 0 1 1 2 2 3 3
- depth_level_1 (depth) int64 0 1 0 1 0 1 0 1
- I can use
sel
only on fully indexed data (i.e. by usingsession
,seat
andsample
in thedepth
index), so I end up re-indexing my data again and again. - I find using
hvplot
on such DataArray not really straightforward (skipping the details here for easier reading of this already long post).
Is there something I am missing ? Is there a better way to organize my data ? I tried to create mutliple indexes on the same dim for convenience but without success.