0

I have two numpy arrays a and b. I want to subtract each row of b from a. I tried to use:

a1 - b1[:, None]

This works for small arrays, but takes too long when it comes to real world data sizes.

a = np.arange(16).reshape(8,2)

a
Out[35]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15]])

b = np.arange(6).reshape(3,2)

b
Out[37]: 
array([[0, 1],
       [2, 3],
       [4, 5]])

a - b[:, None]
Out[38]: 
array([[[ 0,  0],
        [ 2,  2],
        [ 4,  4],
        [ 6,  6],
        [ 8,  8],
        [10, 10],
        [12, 12],
        [14, 14]],

       [[-2, -2],
        [ 0,  0],
        [ 2,  2],
        [ 4,  4],
        [ 6,  6],
        [ 8,  8],
        [10, 10],
        [12, 12]],

       [[-4, -4],
        [-2, -2],
        [ 0,  0],
        [ 2,  2],
        [ 4,  4],
        [ 6,  6],
        [ 8,  8],
        [10, 10]]])

%%timeit
a - b[:, None]
The slowest run took 10.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.18 µs per loop

This approach is too slow / inefficient for larger arrays.

a1 = np.arange(18900 * 41).reshape(18900, 41)

b1 = np.arange(2674 * 41).reshape(2674, 41)

%%timeit
a1 - b1[:, None]
1 loop, best of 3: 12.1 s per loop

%%timeit
for index in range(len(b1)):
    a1 - b1[index]
1 loop, best of 3: 2.35 s per loop

Is there any numpy trick I can use to speed this up?

mistakeNot
  • 743
  • 2
  • 10
  • 24

1 Answers1

2

You are playing with memory limits.

If like in your examples, 8 bits are sufficient to store data, use uint8:

import numpy as np
a1 = np.arange(18900 * 41,dtype=np.uint8).reshape(18900, 41)
b1 = np.arange(2674 * 41,dtype=np.uint8).reshape(2674, 41)
%time c1=(a1-b1[:,None])
#1.02 s
B. M.
  • 18,243
  • 2
  • 35
  • 54