0

I have a (500000,30) numpy array and we can look it as a length-500000 list of size-30 vectors. I want to choose arbitrary 4 elements in the vector, calculate its product, and store all the 4-element-products. Finally I need to calculate the mean of 500000 results.

I have tried it with np.einsum but it runs really slow. How can I improve the efficiency?

# array.shape = (500000,30)
expect = np.sum(np.einsum('ni,nj,nk,nr->ijkr',array,array,array,array),axis=0)/500000
Inm
  • 131
  • 3
  • Please show a loop-based implementation on a 5x4 array, selecting 3 elements. I don't see where you are making the selection of 4 elements. When prose and code don't match, it's really hard to test an implementation. – Mad Physicist Oct 27 '21 at 18:13
  • A very valuable tool of debugging and implementation is to make a *small* example that you can visualize entirely, i.e., a [mcve]. You can't expect for me to test against a 500k-length array of anything, but you can surely expect me to match a 5-length array. – Mad Physicist Oct 27 '21 at 18:13

2 Answers2

0

The sum of all possible products of 4 entries in a vector v is equal to the 4th power of the sum of entries of v. Note that this assumes that the same vector entry can appear more than once in a product, and the sum will include products that differ only by the order of their entries (so e.g. v[1] * v[2] * v[3] * v[4] and v[4] * v[3] * v[2] * v[1] will count as different products). Since your code performs the same computations, I assume that this is what you want. In any case, the value of

np.sum(np.einsum('ni,nj,nk,nr->ijkr', array, array, array, array))

is the same as that of

(array.sum(axis=1)**4).sum()

but the latter will be computed much faster.

In your code you are taking the sum along the 0-axis of the 30x30x30x30 array produced by np.einsum, but I am not sure why.

bb1
  • 7,174
  • 2
  • 8
  • 23
  • Be careful, *the OP has edited the question* to use `np.sum(...,axis=0)` instead of `np.sum(...)`. The former gives a matrix in output while the later gives a scalar and is much easier to compute. The two results in different output. – Jérôme Richard Oct 27 '21 at 18:28
  • 1
    @JérômeRichard This is noted in the last sentence of my answer. The OP writes that they want to compute the average of 500,000 results. SInce in either case there are no 500,000 results, I am not certain what is the real goal here. – bb1 Oct 27 '21 at 18:33
0

You can compute the solution much more efficiently by factorizing the dot products in the last dimension. Moreover, you can tell Numpy to optimize the einsum (at the expense of a higher latency which is not a problem here). Here is the resulting code:

expect = np.einsum('n,nj,nk,nr->jkr',np.sum(array, axis=1),array,array,array,optimize=True)/500000

This is 63 times faster on my machine. If you want to optimize this further, then you can perform the computation in parallel using multiple threads. Indeed, the default Numpy implementation is sequential. You can use Numba to do that. I expect the computation to be about 360 times faster on my 6-core machine (since the computation is compute-bound).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59