I have two-dimensional data and I have a bunch of two-dimensional bins generated with scipy.stats.binned_statistic_2d
. For each data point, I want the index of the bin it occupies. This is exactly what np.digitize
is for, but as far as I can tell, it only deals with one-dimensional data. This stackexchange seems to have an answer, but that is totally generalized to n-dimensions. Is there a more straightforward solution for two dimensions?
Asked
Active
Viewed 7,897 times
10
-
whoops! yep! thanks a lot. – Alex Jul 27 '15 at 14:50
2 Answers
7
You can already get the bin index of each observation from the fourth return variable of scipy.stats.binned_statistic_2d
:
Returns: statistic : (nx, ny) ndarray The values of the selected statistic in each two-dimensional bin. xedges : (nx + 1) ndarray The bin edges along the first dimension. yedges : (ny + 1) ndarray The bin edges along the second dimension. binnumber : (N,) array of ints or (2,N) ndarray of ints This assigns to each element of sample an integer that represents the bin in which this observation falls. The representation depends on the expand_binnumbers argument. See Notes for details.
-
1With the option `expand_binnumbers=True` (new in version 0.17.0) the binnumber meaning changes ([see doc](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html)) making it more easy to use depending on the application. – Michele Mar 12 '20 at 14:22
2
a simple solution using numpy:
bins = [[0.3, 0.5, 0.7], [0.3, 0.7]]
values = np.random.random((10, 2))
digitized = []
for i in range(len(bins)):
digitized.append(np.digitize(values[:, i], bins[i], right=False))
digitized = np.concatenate(digitized).reshape(10, 2)

Alfredo
- 21
- 1