1

I have know that TensorFlow offer Distributed Training API that can train on multiple devices such as multiple GPUs, CPUs, TPUs, or multiple computers ( workers) Follow this doc : https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras

But I have a question is this any possible way to split the train using Data Parallelism to train across multiple machines ( include mobile devices and computer devices)?

I would be really grateful if you have any tutorial/instruction.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
TMN167
  • 112
  • 1
  • 8
  • SO you want to use all the devices be it a PDA, cellphone or a system to train your model ? – Yash Kumar Atri Apr 10 '20 at 08:44
  • @YashKumarAtri Yes. I want to split training on many different devices to reduce the time of training phase. – TMN167 Apr 10 '20 at 08:58
  • 1
    It wont help. Small devices doesn't come with that powerful hardware, The whole idea is to train models on bigger machines and use model distillation for inference on mobile devices. But you can use computers on same network for training. – Yash Kumar Atri Apr 10 '20 at 11:19

2 Answers2

1

As per my knowledge, Tensorflow only supports CPU, TPU, and GPU for distributed training, considering all the devices should be in the same network.

For connecting multiple devices, as you mentioned you can follow Multi-worker training.

tf.distribute.Strategy is integrated to tf.keras, so when model.fit is used with tf.distribute.Strategy instance and then using strategy.scope() for your model allows to create distributed variables.This allows it to equally divide your input data on your devices. You can follow this tutorial for more details.
Also Distributed input could help you.

0

on Tensorflow 2 you can use following code

mirrored_strategy = tf.distribute.MirroredStrategy()
from tensorflow.keras import models
with mirrored_strategy.scope():
     model = models.Sequential()
     ......
model.compile(....) # model compile should be out of "with" statement

See: https://keras.io/guides/distributed_training/ https://www.tensorflow.org/tutorials/distribute/keras