Consider a simple record array structure:
import numpy as np
ijv_dtype = [
('I', 'i'),
('J', 'i'),
('v', 'd'),
]
ijv = np.array([
(0, 0, 3.3),
(0, 1, 1.1),
(0, 1, 4.4),
(1, 1, 2.2),
], ijv_dtype)
print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]
I'd like to aggregate certain statistics (sum, min, max, etc.) from v
by grouping unique combinations of I
and J
. Thinking from SQL, the expected result is:
select i, j, sum(v) as v from ijv group by i, j;
i | j | v
---+---+-----
0 | 0 | 3.3
0 | 1 | 5.5
1 | 1 | 2.2
(the order is not important)
The best I can think up for NumPy is ugly, and I'm not confident I've ordered the result correctly (although it seems to work here):
# Get unique groups, index and inverse
u_ij, idx_ij, inv_ij = np.unique(ijv[['I', 'J']], return_index=True, return_inverse=True)
# Assemble aggregate
a_ijv = np.zeros(len(u_ij), ijv_dtype)
a_ijv['I'] = u_ij['I']
a_ijv['J'] = u_ij['J']
a_ijv['v'] = [ijv['v'][inv_ij == i].sum() for i in range(len(u_ij))]
print(a_ijv) # [(0, 0, 3.3) (0, 1, 5.5) (1, 1, 2.2)]
I'd like to think there is a better way to do this! I'm using NumPy 1.4.1.