1

I am trying to organize 3D data collected from several participants with a different number of samples for each participant. Each participant has a unique session and seat index in the experiment. For each participant i, I have a 3D array composed of Ni images (height*width).

I first tried by creating a Dataset of participants but I ended up having many NaNs due to the fact that participants have different samples on the same dimension (sample dim). I then switched to a unique DataArray containing all my participants data concatenated on a single dimension I call depth. This dimension is then associated to a multiindex coordinate combining session, seatand sample coordinates:

<xarray.DataArray (depth: 52, height: 4, width: 4)>
array([[[0.92337111, 0.86505447, 0.08541727, 0.74850848],
        [0.02336959, 0.0495726 , 0.98745956, 0.58831929],
        [0.62128185, 0.7732787 , 0.27716268, 0.83634779],
        [0.08146719, 0.35851012, 0.44170263, 0.74338872]],
...
       [[0.4365896 , 0.23527988, 0.86891853, 0.94486637],
        [0.20884748, 0.81012315, 0.61542411, 0.76706922],
        [0.33391262, 0.88955315, 0.25329999, 0.35803887],
        [0.49586615, 0.94767265, 0.40868892, 0.42393425]]])
Coordinates:
  * height   (height) int64 0 1 2 3
  * width    (width) int64 0 1 2 3
  * depth    (depth) MultiIndex
  - session  (depth) int64 0 0 0 0 0 0 0 0 0 0 0 1 1 ... 3 3 3 3 3 3 3 3 3 3 3 3
  - seat     (depth) int64 0 0 0 0 0 1 1 1 1 1 1 0 0 ... 0 0 0 0 0 1 1 1 1 1 1 1
  - sample   (depth) int64 0 1 2 3 4 0 1 2 3 4 5 0 1 ... 1 2 3 4 5 0 1 2 3 4 5 6

However I find this solution not really usable for several reasons:

  • each time I want to perform a groupby I have to reset the index to recreate one with the coordinates I want to group since xarray does not support multiple groupby on the same dim:
da = da.reset_index('depth')
da = da.set_index(depth=['session', 'seat'])
da.groupby('depth').mean()
  • the result of the code above is not perfect as it does not maintain the multiindex names:
<xarray.DataArray (depth: 8, height: 4, width: 4)>
array([[[0.47795382, 0.67322777, 0.12946181, 0.48983815],
        [0.33895882, 0.46772217, 0.62886196, 0.55970122],
        [0.57370573, 0.47272117, 0.31529004, 0.63230245],
        [0.63230284, 0.5352105 , 0.65805407, 0.65274841]],
...
       [[0.55672404, 0.37963945, 0.57334768, 0.64853806],
        [0.46608072, 0.39506509, 0.66339553, 0.71447367],
        [0.58989461, 0.66066485, 0.53271228, 0.43036214],
        [0.44163921, 0.54990042, 0.4229631 , 0.5941268 ]]])
Coordinates:
  * height         (height) int64 0 1 2 3
  * width          (width) int64 0 1 2 3
  * depth          (depth) MultiIndex
  - depth_level_0  (depth) int64 0 0 1 1 2 2 3 3
  - depth_level_1  (depth) int64 0 1 0 1 0 1 0 1
  • I can use sel only on fully indexed data (i.e. by using session, seatand sample in the depth index), so I end up re-indexing my data again and again.
  • I find using hvplot on such DataArray not really straightforward (skipping the details here for easier reading of this already long post).

Is there something I am missing ? Is there a better way to organize my data ? I tried to create mutliple indexes on the same dim for convenience but without success.

Tabs
  • 154
  • 6

0 Answers0