From the github code, it seems the MatMul op doesn't support partitioned matrixes. So is there any tool in tensorflow that supports multiplication of two huge matrixes that are distributed across multiple nodes?
Asked
Active
Viewed 924 times
1
-
check out this answer . [link](http://stackoverflow.com/questions/35564253/tensorflow-element-wise-matrix-multiplication?rq=1) I think this answer can help you. – Vahan Nasibyan Feb 17 '17 at 14:19
-
This is not related, but thanks a lot for your comment – Jia Zou Feb 21 '17 at 17:18
1 Answers
3
Support for distributing computation across machines is built into TensorFlow. I would recommend reading distributed TensorFlow docs to figure out how to setup a TensorFlow cluster.
Once cluster is setup, you can decide how to partition your problem and use with tf.device
to allocate each worker to their partition of work.
For instance, suppose you are multiplying a*a'
, and you want to split intermediate multiplication evenly over 2 workers, and the aggregate results on the 3rd.
You would do something like this:
with tf.device(worker0):
# load a1
b1 = tf.matmul(a1, tf.transpose(a1))
with tf.device(worker1):
# load a2
b2 = tf.matmul(a2, tf.transpose(a2))
with tf.device(worker2):
result = b1+b2
The load a1
part depends on how big is your matrix is stored. If it's huge, then perhaps load a1
will read it from disk. If it fits in memory, you can use a1=a[:n/2,:]
to get a partition of it

Yaroslav Bulatov
- 57,332
- 22
- 139
- 197
-
I understand that we can implement distributed matrix multiplication by ourselves. Actually if we want to implement two different huge matrix, there are many options of parallel algorithms. My post is mainly asking whether google has opensourced implementation of distributed matrix multiplication on tensorflow. Now I guess there is none. This sort of surprising me, because it assumes we can not store a large model that exceeds memory in NN training. – Jia Zou Feb 21 '17 at 17:18
-
The trend over the last few years has been to make neural networks smaller. IE, it started with 10 billion parameter model that found cats in youtube videos to millions of parameter networks that have much higher accuracy, are much smaller, and fit on mobile phones. – Yaroslav Bulatov Feb 21 '17 at 17:55
-
Model parallelism is very good parallelism in a sense that it reduces problem of parallelization to engineering problem and all engineering problems can be solved. When I was at Google we tried hard to find a good application for model parallelism, but not successfully, the state of the art for neural networks is still typically achieved with models that can be handled by gaming GPUs – Yaroslav Bulatov Feb 21 '17 at 17:58
-
Now to think of it, TensorFlow Estimator API supports multiple devices, so it has distributed matrix multiplication under the covers. I suspect it just does the obvious thing, so it may be easier to start from scratch using the `tf.device` pattern rather that try to extract it from the code – Yaroslav Bulatov Feb 21 '17 at 18:10