12

Tensorflow has a function called batch_matmul which multiplies higher dimensional tensors. But I'm having a hard time understanding how it works, perhaps partially because I'm having a hard time visualizing it.

enter image description here

What I want to do is multiply a matrix by each slice of a 3D tensor, but I don't quite understand what the shape of tensor a is. Is z the innermost dimension? Which of the following is correct?

enter image description here

I would most prefer the first to be correct -- it's most intuitive to me and easy to see in the .eval() output. But I suspect the second is correct.

Tensorflow says that batch_matmul performs:

out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])

What does that mean? What does that mean in the context of my example? What is being multiplied with with what? And why aren't I getting a 3D tensor the way I expected?

Alex Lenail
  • 12,992
  • 10
  • 47
  • 79

6 Answers6

23

You can imagine it as doing a matmul over each training example in the batch.

For example, if you have two tensors with the following dimensions:

a.shape = [100, 2, 5]
b.shape = [100, 5, 2]

and you do a batch tf.matmul(a, b), your output will have the shape [100, 2, 2].

100 is your batch size, the other two dimensions are the dimensions of your data.

Augustin
  • 2,444
  • 23
  • 24
Daniela
  • 595
  • 1
  • 4
  • 8
  • I feel like you're only partially answering the question. Specifically, why does b's first dimension in your example have to be 100? what if I have a tensor a, which is a batch of examples, and I want to apply the sampe operation on each of them, i.e. I want to multiply each of them by b, which is [5, 2]. Is the only way of accomplishing this with a tf.tile? And if not, how is the output of batch_matmul defined? – Alex Lenail Dec 17 '15 at 06:56
  • 1
    @AlexLenail: I have the exact same question - I want to multiply a 3D tensor by 2D tensor without explicit tiling of the 2D tensor. Did you find an answer? – ahmadh Apr 06 '16 at 14:27
  • Use the broadcasting mechanism supported by matmul – Andrzej Pronobis May 29 '16 at 21:35
18

First of all tf.batch_matmul() was removed and no longer available. Now you suppose to use tf.matmul():

The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.

So let's assume you have the following code:

import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)

Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.

Notice that something like this is also valid:

A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)

and will give you a shape (a, b, n, k)

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
4

You can now do it using tf.einsum, starting from Tensorflow 0.11.0rc0.

For example,

M1 = tf.Variable(tf.random_normal([2,3,4]))
M2 = tf.Variable(tf.random_normal([5,4]))  
N = tf.einsum('ijk,lk->ijl',M1,M2)       

It multiplies the matrix M2 with every frame (3 frames) in every batch (2 batches) in M1.

The output is:

[array([[[ 0.80474716, -1.38590837, -0.3379252 , -1.24965811],
        [ 2.57852983,  0.05492432,  0.23039417, -0.74263287],
        [-2.42627382,  1.70774114,  1.19503212,  0.43006262]],

       [[-1.04652011, -0.32753903, -1.26430523,  0.8810069 ],
        [-0.48935518,  0.12831448, -1.30816901, -0.01271309],
        [ 2.33260512, -1.22395933, -0.92082584,  0.48991606]]], dtype=float32),
array([[ 1.71076882, 0.79229093, -0.58058828, -0.23246667],
       [ 0.20446332,  1.30742455, -0.07969904,  0.9247328 ],
       [-0.32047141,  0.66072595, -1.12330854,  0.80426538],
       [-0.02781649, -0.29672042,  2.17819595, -0.73862702],
       [-0.99663496,  1.3840003 , -1.39621222,  0.77119476]], dtype=float32), 
array([[[ 0.76539308, 2.77609682, -1.79906654,  0.57580602, -3.21205115],
        [ 4.49365759, -0.10607499, -1.64613271,  0.96234947, -3.38823152],
        [-3.59156275,  2.03910899,  0.90939498,  1.84612727,  3.44476724]],

       [[-1.52062428,  0.27325237,  2.24773455, -3.27834225,  3.03435063],
        [ 0.02695178,  0.16020992,  1.70085776, -2.8645196 ,  2.48197317],
        [ 3.44154787, -0.59687197, -0.12784094, -2.06931567, -2.35522676]]], dtype=float32)]

I have verified, the arithmetic is correct.

xuancong84
  • 1,412
  • 16
  • 17
  • Just a small doubt, is `tf.einsum()` fast or slow compared to other methods such as `batch_matmul(), matmul()` ? I want to implement a tensordot product in tensorflow but only `einsum()` method only seems to support it and the rest of the methods need some reshaping and shaping back again procedures so I want to know if it's effecient to use `einsum()` – pikachuchameleon Feb 04 '17 at 16:19
  • It should depend on the tensorflow implementation which varies across different versions. – xuancong84 Dec 18 '18 at 08:49
  • @pikachuchameleon It should be identical. einsum reduces to matmul and transposes. An equation will be slow if a transpose is needed as this requires a deep copy. – Roy Jul 16 '20 at 12:19
2

tf.tensordot should solve this problem. It supports batch operations, e.g., if you want to contract a 2D tensor with a 3D tensor, with the latter having a batch dimension.

If a is shape [n,m] b is shape [?,m,l], then

y = tf.tensordot(b, a, axes=[1, 1]) will produce a tensor of shape [?,n,l]

https://www.tensorflow.org/api_docs/python/tf/tensordot

aph
  • 31
  • 1
-1

It is simply like splitting on the first dimension respectively, multiply and concat them back. If you want to do 3D by 2D, you can reshape, multiply, and reshape it back. I.e. [100, 2, 5] -> [200, 5] -> [200, 2] -> [100, 2, 2]

Saurfang
  • 685
  • 7
  • 14
-1

The answer to this particular answer is using tf.scan function.

If a = [5,3,2] #dimension of 5 batch, with 3X2 mat in each batch
and b = [2,3] # a constant matrix to be multiplied with each sample

then let def fn(a,x): return tf.matmul(x,b)

initializer = tf.Variable(tf.random_number(3,3))

h = tf.scan(fn,outputs,initializer)

this h will store all the outputs.