1

Suppose I have an array X and a list of indices k_ar, the maximum of which is K - 1.

What I want to do is basically split X in such a way that X[i] goes into sub-array k_ar[i]. The O(n) way to do this would be the following:

X = [5, 1, 3, 2, 2, 1]

k_ar = [0, 1, 0, 1, 2]

K = max(k_ar) + 1

sub_X = [[] for k in range(K)]

for k, x in zip(k_ar, X):
    sub_X[k].append(x)

Although this is the ideal algorithm to do this kind of thing, I was wondering if Numpy, Scipy or any other library had a faster way of doing it. I could, for example, do this, but it is O(nK) instead of O(n), and so sub-optimal for large K, although very fast in n:

import numpy as np

X = np.ndarray([5, 1, 3, 2, 2, 1], dtype=np.int8)

k_ar = np.ndarray([0, 1, 1, 0, 1, 2], dtype=np.int8)

K = max(k_ar)

sub_X = np.empty(K, dtype=np.ndarray)

for k in range(K):
    sub_X[k] = X[k_ar == k]

So, again, is there a way of speeding this up without using e.g. Numba, Cython or PyPy?

Hameer Abbasi
  • 1,292
  • 1
  • 12
  • 34

1 Answers1

0

Your algorithm is rather O(n): Iteration for max needs n steps, iteration for list creation has n steps and iteration for placement has n steps, too.

Also, I'm not sure if there is any reason to keep the original list and indices intact during iteration which means you can keep your memory at n elements instead of 2n by popping.

Final code - O(n) memory, O(n) CPU:

X = [5, 1, 3, 2, 2, 1]
k_ar = [0, 1, 0, 1, 2]
sub_x = []
while X:
    k = k_ar.pop()
    try:
        sub_x[k].append(X.pop())
    except IndexError:
        sub_x.extend([] for i in range(len(sub_x), k+1))
        sub_x[k].append(X.pop())
Community
  • 1
  • 1
Bharel
  • 23,672
  • 5
  • 40
  • 80