Memory and time in tensor operations python

Question

Goal My goal is to calculate the tensor given by the formula which you can see below. The indices i, j, k, l run from 0 to 40 and p, m, x from 0 to 80.

Tensordot approach This summation is just contracting 6 indices of enormous tensor. I tried to do it by tensor dot which allows for such calculation but then my problem is memory even if I do one tensor dot followed by the other. (I work in colab so I have 12GB RAM available)

Nested loops approach But there are some addtitional symmetries governing the B matrix i.e. the only non zero elements of B{ijpx} are such that i+j= p+x. Therefore I was able to write p and m as a function of x (p=i+j-x, m=k+l-x)and then I did 5 loops namely for i,j,k,l,x but then on the other hand the timing is the problem since calculation takes 136 seconds and I want to repeat it many times.

Timing goal in nested loop approach Reduction of the time by a factor of ten would be satisfactory but if it would be possible to reduce it by a factor of 100 it would be more than enough.

Do you have any ideas either for going around memory problem or reducing the timing? How do you handle such summations with additional constraints?

(Remark: The matrix A is symmetric and I have not used this fact so far. There are no more symmetries.)

Here is the code for nested loop:

for i in range (0,40):
  for j in range (0,40):
    for k in range (0,40):
      for l in range (0,40):
            Sum=0
            for x in range (0,80):
              p=i+j-x
              m=k+l-x
              if p>=0 and p<80 and m>=0 and m<80:
                Sum += A[p,m]*B[i,j,p,x]*B[k,l,m,x]
            T[i,j,k,l]= Sum

And the code for the tensor dot approach:

P=np.tensordot(A,B,axes=((0),(2)))
T=np.tensordot(P,B,axes=((0,3),(2,3)))

Can you show your attempts with `tensordot` and with nested loops, so we get a better idea of the problem? — jdehesa, Jan 29 '19 at 16:27
Also have you tried with [`tf.einsum`](https://www.tensorflow.org/api_docs/python/tf/einsum)? Like: `t = tf.einsum('pm,ijpx,klmx->ijkl', a, b, b)`. Not sure it should make any difference with tensordot but may give a try. — jdehesa, Jan 29 '19 at 16:38
Also do you need this in TensorFlow or NumPy? Looks like your code is NumPy but the tags include TensorFlow. — jdehesa, Jan 29 '19 at 16:49
I work in NumPy, sorry about the misleading tag. I updated the question a little bit. And no I have not tried the tf.einsum approach. — Michal, Jan 29 '19 at 16:55
Okay thanks for the update. Well it's the same in NumPy really, `t = np.einsum('pm,ijpx,klmx->ijkl', a, b, b)`. — jdehesa, Jan 29 '19 at 17:08
Just to clarify the einsum is much, much longer than the nested loop because it does not use the additional constraint. After more than 5 minutes of waiting I killed the process. — Michal, Jan 29 '19 at 20:00

score 2 · Accepted Answer · answered Jan 29 '19 at 17:22

Numba might be your best bet here. I put together this function based on your code. I changed it a bit to avoid some unnecessary iterations and the if block:

import numpy as np
import numba as nb

@nb.njit(parallel=True)
def my_formula_nb(A, B):
    di, dj, dx, _ = B.shape
    T = np.zeros((di, dj, di, dj), dtype=A.dtype)
    for i in nb.prange (di):
        for j in nb.prange (dj):
            for k in nb.prange (di):
                for l in nb.prange (dj):
                    sum = 0
                    x_start = max(0, i + j - dx + 1, k + l - dx + 1)
                    x_end = min(dx, i + j + 1, k + l + 1)
                    for x in range(x_start, x_end):
                        p = i + j - x
                        m = k + l - x
                        sum += A[p, m] * B[i, j, p, x] * B[k, l, m, x]
                    T[i, j, k, l] = sum
    return T

Let's see it in action:

import numpy as np

def make_problem(di, dj, dx):
    a = np.random.rand(dx, dx)
    a = a + a.T
    b = np.random.rand(di, dj, dx, dx)
    b_ind = np.indices(b.shape)
    b_mask = b_ind[0] + b_ind[1] != b_ind[2] + b_ind[3]
    b[b_mask] = 0
    return a, b

# Generate a problem
np.random.seed(100)
a, b = make_problem(15, 20, 25)
# Solve with Numba function
t1 = my_formula_nb(a, b)
# Solve with einsum
t2 = np.einsum('pm,ijpx,klmx->ijkl', a, b, b)
# Check result
print(np.allclose(t1, t2))
# True

# Benchmark (IPython)
%timeit np.einsum('pm,ijpx,klmx->ijkl', a, b, b)
# 4.5 s ± 39.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit my_formula_nb(a, b)
# 6.06 ms ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As you can see, the Numba solution is about three orders of magnitude faster, and it should not take any more memory than necessary.

Thank you very much!! :) Your code works brilliantly. The timing was reduced by a factor of 100 which is my timing heaven. I just edited the line di, dj, dx, _ = B.shape to be di, dj, dx, _ = (40,40,80,80) since the output tensor is supposed to have all for dimensions of size 40. — Michal, Jan 29 '19 at 20:03

Memory and time in tensor operations python

1 Answers1