Split array by list of sub-array indices

Question

Suppose I have an array X and a list of indices k_ar, the maximum of which is K - 1.

What I want to do is basically split X in such a way that X[i] goes into sub-array k_ar[i]. The O(n) way to do this would be the following:

X = [5, 1, 3, 2, 2, 1]

k_ar = [0, 1, 0, 1, 2]

K = max(k_ar) + 1

sub_X = [[] for k in range(K)]

for k, x in zip(k_ar, X):
    sub_X[k].append(x)

Although this is the ideal algorithm to do this kind of thing, I was wondering if Numpy, Scipy or any other library had a faster way of doing it. I could, for example, do this, but it is O(nK) instead of O(n), and so sub-optimal for large K, although very fast in n:

import numpy as np

X = np.ndarray([5, 1, 3, 2, 2, 1], dtype=np.int8)

k_ar = np.ndarray([0, 1, 1, 0, 1, 2], dtype=np.int8)

K = max(k_ar)

sub_X = np.empty(K, dtype=np.ndarray)

for k in range(K):
    sub_X[k] = X[k_ar == k]

So, again, is there a way of speeding this up without using e.g. Numba, Cython or PyPy?

The first example looks good. You need `np.array` for the second example BTW. — Eric Duminil, Aug 01 '17 at 12:19

score 0 · Answer 1 · edited Aug 07 '17 at 09:10

0

Your algorithm is rather O(n): Iteration for max needs n steps, iteration for list creation has n steps and iteration for placement has n steps, too.

Also, I'm not sure if there is any reason to keep the original list and indices intact during iteration which means you can keep your memory at n elements instead of 2n by popping.

Final code - O(n) memory, O(n) CPU:

X = [5, 1, 3, 2, 2, 1]
k_ar = [0, 1, 0, 1, 2]
sub_x = []
while X:
    k = k_ar.pop()
    try:
        sub_x[k].append(X.pop())
    except IndexError:
        sub_x.extend([] for i in range(len(sub_x), k+1))
        sub_x[k].append(X.pop())

edited Aug 07 '17 at 09:10

Community

1
1

answered Aug 01 '17 at 12:30

Bharel

23,672
5
40
80

Wait, doesn't `O(n) = O(kn)` when `k` is constant? I.e. `O(3n) = O(n) = O(2n)`? – Błażej Michalik Aug 03 '17 at 14:11
Not entirely sure but it's half the memory so why the hell not :-) – Bharel Aug 03 '17 at 15:18
Yes, of course, I'm just pointing out that (I think) that's how the big-O notation works :) – Błażej Michalik Aug 04 '17 at 05:52

Split array by list of sub-array indices

1 Answers1