Suppose I have an array X
and a list of indices k_ar
, the maximum of which is K - 1
.
What I want to do is basically split X
in such a way that X[i]
goes into sub-array k_ar[i]
. The O(n)
way to do this would be the following:
X = [5, 1, 3, 2, 2, 1]
k_ar = [0, 1, 0, 1, 2]
K = max(k_ar) + 1
sub_X = [[] for k in range(K)]
for k, x in zip(k_ar, X):
sub_X[k].append(x)
Although this is the ideal algorithm to do this kind of thing, I was wondering if Numpy, Scipy or any other library had a faster way of doing it. I could, for example, do this, but it is O(nK)
instead of O(n)
, and so sub-optimal for large K
, although very fast in n
:
import numpy as np
X = np.ndarray([5, 1, 3, 2, 2, 1], dtype=np.int8)
k_ar = np.ndarray([0, 1, 1, 0, 1, 2], dtype=np.int8)
K = max(k_ar)
sub_X = np.empty(K, dtype=np.ndarray)
for k in range(K):
sub_X[k] = X[k_ar == k]
So, again, is there a way of speeding this up without using e.g. Numba, Cython or PyPy?