Approach #1
Here's a NumPy vectorized approach using broadcasting
-
R,C = np.where((A[:,None,None] == B).any(-1))
out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
Approach #2
Assuming A
and B
to hold positive numbers, we can consider those to represent indices on a 2D
grid, such that B
could be considered to hold the column indices on per row basis. Once that 2D
grid corresponding to B
is in place, we just need to consider only the columns that are intersected by A
. Finally, we get the indices of True
values in a such a 2D
grid to give us R
and C
values. This should be much more memory-efficient.
Thus, the alternative approach would look something like this -
ncols = B.max()+1
nrows = B.shape[0]
mask = np.zeros((nrows,ncols),dtype=bool)
mask[np.arange(nrows)[:,None],B] = 1
mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0
R,C = np.where(mask.T)
out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
Sample run -
In [43]: A
Out[43]: array([0, 1, 2, 3])
In [44]: B
Out[44]:
array([[3, 2, 0],
[0, 2, 1],
[2, 3, 1],
[3, 0, 1]])
In [45]: out
Out[45]: [array([0, 1, 3]), array([1, 2, 3]), array([0, 1, 2]), array([0, 2, 3])]
Runtime test
Scaling up the dataset sizes by 100x
, here's a quick runtime test result -
In [85]: def index_1din2d(A,B):
...: R,C = np.where((A[:,None,None] == B).any(-1))
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
...: def index_1din2d_initbased(A,B):
...: ncols = B.max()+1
...: nrows = B.shape[0]
...: mask = np.zeros((nrows,ncols),dtype=bool)
...: mask[np.arange(nrows)[:,None],B] = 1
...: mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0
...: R,C = np.where(mask.T)
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
In [86]: A = np.unique(np.random.randint(0,10000,(400)))
...: B = np.random.randint(0,10000,(400,300))
...:
In [87]: %timeit [np.where((B == x).sum(axis = 1))[0] for x in A]
1 loop, best of 3: 161 ms per loop # @Psidom's soln
In [88]: %timeit index_1din2d(A,B)
10 loops, best of 3: 91.5 ms per loop
In [89]: %timeit index_1din2d_initbased(A,B)
10 loops, best of 3: 33.4 ms per loop
Further performance-boost!
Well, alternatively we can create the 2D
grid in the second approach in a transposed way. The idea is to avoid the transpose in R,C = np.where(mask.T)
, which seemed like the bottleneck. So, a modified version of the second approach and the associated runtimes would look something like this -
In [135]: def index_1din2d_initbased_v2(A,B):
...: nrows = B.max()+1
...: ncols = B.shape[0]
...: mask = np.zeros((nrows,ncols),dtype=bool)
...: mask[B,np.arange(ncols)[:,None]] = 1
...: mask[~np.in1d(np.arange(mask.shape[0]),A)] = 0
...: R,C = np.where(mask)
...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1)
...: return out
...:
In [136]: A = np.unique(np.random.randint(0,10000,(400)))
...: B = np.random.randint(0,10000,(400,300))
...:
In [137]: %timeit index_1din2d_initbased(A,B)
10 loops, best of 3: 57.5 ms per loop
In [138]: %timeit index_1din2d_initbased_v2(A,B)
10 loops, best of 3: 25.9 ms per loop