Suppose I have two arrays of type int8. I want to use einsum on them in such a way that all the calculations will be done as int64, but I don't want to convert the whole arrays to int64. If I understand correctly, this is what the dtype argument is for. But it seems to not always work.
An example where it does work as expected:
>>> A = np.array([[123, 45],[67,89]], dtype='int8')
>>> np.einsum(A,[0,1],A,[0,1],[1]) ## => integer overflow
array([-94, -38], dtype=int8)
>>> np.einsum(A,[0,1],A,[0,1],[1], dtype='int64') ## => no overflow
array([19618, 9946], dtype=int64)
An example where it does not work as expected:
>>> A = np.array([[123, 45],[67,89]], dtype='int8')
>>> np.einsum(A,[0,1],A,[1,2],[0,2]) ## => integer overflow
array([[-32, 68],
[124, -72]], dtype=int8)
>>> np.einsum(A,[0,1],A,[1,2],[0,2], dtype='int64') ## => should not overflow, but it does
array([[-32, 68],
[124, -72]], dtype=int8)
For these examples I used python 3.6.4 on windows with numpy 1.14.0
The same happens when I try other dtypes, e.g., float64, and even if I use casting='unsafe'.
Why does this happen, and how can I make it work?
UPDATE:
einsum has optimize=True as default, at least in numpy 1.14.0.
When using optimize=False, it works as expected (although much slower, for large arrays):
>>> A = np.array([[123, 45],[67,89]], dtype='int8')
>>> np.einsum(A,[0,1],A,[1,2],[0,2]) ## => integer overflow
array([[-32, 68],
[124, -72]], dtype=int8)
>>> np.einsum(A,[0,1],A,[1,2],[0,2], dtype='int64', optimize=False)
array([[18144, 9540],
[14204, 10936]], dtype=int64)
From a brief look at einsum.py, it seems that when optimize=True, then it checks whether it is better to use numpy.tensordot instead of the einsum implementation. If it is (which it should be in my 2nd example, as it is just a regular matrix multiplication), then it uses tensordot, but it does not pass the dtype argument to it. Indeed, tensordot does not even have a dtype argument.
If this is correct, then a followup question is required (which is probably worth its own post?):
How can I matrix-multiply two matrices of a certain dtype, say, int8, in such a way that all the calculations will be done as, say, int64 or float64 (and so will not overflow / loose precision, unless an int64/float64 would too), but without having to convert them first to the desired type, and moreover, the operation's implementation itself should not convert the matrices as a whole, only small parts each time (so the memory required for the operation will not be much larger than the memory required just to hold these matrices and the result)?
Can this be done with efficiency comparable to numpy.dot?