10

I have two-dimensional data and I have a bunch of two-dimensional bins generated with scipy.stats.binned_statistic_2d. For each data point, I want the index of the bin it occupies. This is exactly what np.digitize is for, but as far as I can tell, it only deals with one-dimensional data. This stackexchange seems to have an answer, but that is totally generalized to n-dimensions. Is there a more straightforward solution for two dimensions?

Community
  • 1
  • 1
Alex
  • 302
  • 3
  • 16

2 Answers2

7

You can already get the bin index of each observation from the fourth return variable of scipy.stats.binned_statistic_2d:

Returns:  
  statistic : (nx, ny) ndarray
      The values of the selected statistic in each two-dimensional bin.
  xedges : (nx + 1) ndarray
      The bin edges along the first dimension.
  yedges : (ny + 1) ndarray
      The bin edges along the second dimension.
  binnumber : (N,) array of ints or (2,N) ndarray of ints
      This assigns to each element of sample an integer that
      represents the bin in which this observation falls. The
      representation depends on the expand_binnumbers argument.
      See Notes for details.
Michele
  • 2,796
  • 2
  • 21
  • 29
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 1
    With the option `expand_binnumbers=True` (new in version 0.17.0) the binnumber meaning changes ([see doc](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html)) making it more easy to use depending on the application. – Michele Mar 12 '20 at 14:22
2

a simple solution using numpy:

bins = [[0.3, 0.5, 0.7], [0.3, 0.7]]
values = np.random.random((10, 2))
digitized = []
for i in range(len(bins)):
    digitized.append(np.digitize(values[:, i], bins[i], right=False))
digitized = np.concatenate(digitized).reshape(10, 2)
Alfredo
  • 21
  • 1