0

I am not using numpy but Eigen::Tensor C++ API, which only has contraction operations, this is just to enable me think through implementation from python.

So 'ij, ijk -> ik' is basically like doing a for loop for each of the first dimensions.

a = np.random.uniform(size=[10, 4])
b = np.random.uniform(size=[10, 4, 4])
vec = []
for i in range(10):
  vec.append(a[i].dot(b[i]))
print(np.stack(vec, axis=0))
## or with einsum
print(np.einsum('ij,ijk->ik', a, b))

This can not seem to be done easily with tensordot. Any suggestions?

jack
  • 157
  • 5
  • What `tensordot` are you asking about? The `numpy` can't do this. `np.matmul` can. – hpaulj Aug 05 '20 at 01:00
  • I am asking for numpy. Hmm I see, it can be done with np.matmul by expand dim of a. But what I really want to solve is using Eigen::tensor library for such operations. There seems to be no such equivalence there. – jack Aug 05 '20 at 03:08
  • `np.tensordot` just reshapes and transposes the arguments so it can use `np.dot`. It does not do any sort of 'batching'. That's why `matmul` was added. So despite the name, I wouldn't expect an equivalent in a C++ library. But in C++ it shouldn't be expensive to loop over the 'batch' dimension. It doesn't have the distinction between slow interpreted user loops, and fast compiled ones. – hpaulj Aug 05 '20 at 03:32
  • Thanks for the comment. Could you elaborate on why it won't be expensive for c++ to do such a look along batch dimension? In may case, I could have hundred or thousand in the for loop. – jack Aug 05 '20 at 20:32
  • 1
    In c++ all code is compiled whether you write it or you use a library. – hpaulj Aug 05 '20 at 21:52
  • but why would that make this problem fast tho? Isn't that still serial? In the case of np.matmul, I assume we can get some parallelism throu batching. Do you mean we don't need batching in this case but still get good performance in c++? – jack Aug 05 '20 at 21:57
  • What is the real reason for doing this? Calling a BLAS routine repeatedly with such a small (vector,matrix) multiplication is actually the only thing which would be a real (performance critical) mistake. -> The calling overhead is far too high. The best thing would be to completely unroll the vector,matrix multiplication, but simple loops will also do there job. – max9111 Aug 06 '20 at 13:48

0 Answers0