Tensordot equivalent of einsum 'ij, ijk -> ik'

Question

I am not using numpy but Eigen::Tensor C++ API, which only has contraction operations, this is just to enable me think through implementation from python.

So 'ij, ijk -> ik' is basically like doing a for loop for each of the first dimensions.

a = np.random.uniform(size=[10, 4])
b = np.random.uniform(size=[10, 4, 4])
vec = []
for i in range(10):
  vec.append(a[i].dot(b[i]))
print(np.stack(vec, axis=0))
## or with einsum
print(np.einsum('ij,ijk->ik', a, b))

This can not seem to be done easily with tensordot. Any suggestions?

What `tensordot` are you asking about? The `numpy` can't do this. `np.matmul` can. — hpaulj, Aug 05 '20 at 01:00
I am asking for numpy. Hmm I see, it can be done with np.matmul by expand dim of a. But what I really want to solve is using Eigen::tensor library for such operations. There seems to be no such equivalence there. — jack, Aug 05 '20 at 03:08
`np.tensordot` just reshapes and transposes the arguments so it can use `np.dot`. It does not do any sort of 'batching'. That's why `matmul` was added. So despite the name, I wouldn't expect an equivalent in a C++ library. But in C++ it shouldn't be expensive to loop over the 'batch' dimension. It doesn't have the distinction between slow interpreted user loops, and fast compiled ones. — hpaulj, Aug 05 '20 at 03:32
Thanks for the comment. Could you elaborate on why it won't be expensive for c++ to do such a look along batch dimension? In may case, I could have hundred or thousand in the for loop. — jack, Aug 05 '20 at 20:32
In c++ all code is compiled whether you write it or you use a library. — hpaulj, Aug 05 '20 at 21:52
but why would that make this problem fast tho? Isn't that still serial? In the case of np.matmul, I assume we can get some parallelism throu batching. Do you mean we don't need batching in this case but still get good performance in c++? — jack, Aug 05 '20 at 21:57
What is the real reason for doing this? Calling a BLAS routine repeatedly with such a small (vector,matrix) multiplication is actually the only thing which would be a real (performance critical) mistake. -> The calling overhead is far too high. The best thing would be to completely unroll the vector,matrix multiplication, but simple loops will also do there job. — max9111, Aug 06 '20 at 13:48

Tensordot equivalent of einsum 'ij, ijk -> ik'

0 Answers0