Any existing implementation of distributed matrix multiplication in tensorflow?

Question

From the github code, it seems the MatMul op doesn't support partitioned matrixes. So is there any tool in tensorflow that supports multiplication of two huge matrixes that are distributed across multiple nodes?

check out this answer . [link](http://stackoverflow.com/questions/35564253/tensorflow-element-wise-matrix-multiplication?rq=1) I think this answer can help you. — Vahan Nasibyan, Feb 17 '17 at 14:19

score 3 · Answer 1 · answered Feb 17 '17 at 14:24

3

Support for distributing computation across machines is built into TensorFlow. I would recommend reading distributed TensorFlow docs to figure out how to setup a TensorFlow cluster.

Once cluster is setup, you can decide how to partition your problem and use with tf.device to allocate each worker to their partition of work.

For instance, suppose you are multiplying a*a', and you want to split intermediate multiplication evenly over 2 workers, and the aggregate results on the 3rd.

You would do something like this:

with tf.device(worker0):
  # load a1
  b1 = tf.matmul(a1, tf.transpose(a1))

with tf.device(worker1):
  # load a2 
  b2 = tf.matmul(a2, tf.transpose(a2))

with tf.device(worker2):
  result = b1+b2

The load a1 part depends on how big is your matrix is stored. If it's huge, then perhaps load a1 will read it from disk. If it fits in memory, you can use a1=a[:n/2,:] to get a partition of it

answered Feb 17 '17 at 14:24

Yaroslav Bulatov

57,332
22
139
197

I understand that we can implement distributed matrix multiplication by ourselves. Actually if we want to implement two different huge matrix, there are many options of parallel algorithms. My post is mainly asking whether google has opensourced implementation of distributed matrix multiplication on tensorflow. Now I guess there is none. This sort of surprising me, because it assumes we can not store a large model that exceeds memory in NN training. – Jia Zou Feb 21 '17 at 17:18
The trend over the last few years has been to make neural networks smaller. IE, it started with 10 billion parameter model that found cats in youtube videos to millions of parameter networks that have much higher accuracy, are much smaller, and fit on mobile phones. – Yaroslav Bulatov Feb 21 '17 at 17:55
Model parallelism is very good parallelism in a sense that it reduces problem of parallelization to engineering problem and all engineering problems can be solved. When I was at Google we tried hard to find a good application for model parallelism, but not successfully, the state of the art for neural networks is still typically achieved with models that can be handled by gaming GPUs – Yaroslav Bulatov Feb 21 '17 at 17:58
Now to think of it, TensorFlow Estimator API supports multiple devices, so it has distributed matrix multiplication under the covers. I suspect it just does the obvious thing, so it may be easier to start from scratch using the `tf.device` pattern rather that try to extract it from the code – Yaroslav Bulatov Feb 21 '17 at 18:10

Any existing implementation of distributed matrix multiplication in tensorflow?

1 Answers1