Highest Voted 'distributed-tensorflow' Questions

7

votes

2 answers

Tensorflow Mirror Strategy and Horovod Distribution Strategy

I am trying to understand what are the basic difference between Tensorflow Mirror Strategy and Horovod Distribution Strategy. From the documentation and the source code investigation I found that Horovod (https://github.com/horovod/horovod) is using…

asked Mar 05 '19 at 17:15

Md Kamruzzaman Sarker

2,387
3
22
38

2

votes

1 answer

Tensorflow: how to manually shard a dataset

I'm using the MirroredStrategy to perform multi-gpu training and it doesn't appear to be properly sharding the data. How do you go about manually sharding data? I know that I could use the shard method for a tf.data dataset, but for that I need…

python tensorflow tensorflow-datasets distributed-tensorflow

asked Feb 11 '20 at 23:35

Luke

6,699
13
50
88

2

votes

0 answers

Tensorflow CentralStorageStrategy

The tf.distribute.experimentalCentralStorageStrategy specifies that Variables are not mirrored, instead, they are placed on CPU and ops are replicated across all GPUs. If I have a really big model that does not fit on any single GPU, could this be a…

python tensorflow distributed-tensorflow

asked Jan 22 '20 at 15:26

Jack Shi

23
1
5

1

vote

0 answers

How to build a TensorFlow cluster and let each node can make a connection to any rest of the nodes (1 to N-1)?

How to build a TensorFlow cluster and let each node make a connection to any rest of the nodes (1 to N-1)? I check the code and its implementation is server-client with gRPC. Does that mean I should build a server and a client on each node so that…

tensorflow distributed-tensorflow

asked Nov 13 '22 at 03:21

skytree

1,060
2
13
38

1

vote

1 answer

How to test distributed layers on Tensorflow?

I am trying to test a layer that I will add later in a distributed model however I want to be sure that it works before. This is the layer in question: class BNShuffler(tf.Module): def __init__( self, global_batch_size: int=64 …

python python-3.x tensorflow tensorflow2.0 distributed-tensorflow

asked Jul 14 '21 at 18:01

SmileyProd

788
4
13

1

vote

0 answers

How to broadcast with distributed TensorFlow

I want to implement broadcast some values from chief to all workers with distributed TensorFlow like MPI's bcast: https://mpi4py.readthedocs.io/en/stable/tutorial.html#collective-communication I guess broadcast_send or tf.raw_ops.CollectiveBcastSend…

tensorflow tensorflow2.0 distributed-tensorflow

asked Sep 08 '20 at 18:50

Shuhei Fujiwara

193
1
7

1

vote

0 answers

some question about grpc+gdr and grpc+verbs in using distributed tensorflow

when i use distributed tensorflow, grpc+gdr is worse than grpc+verbs, but nv_peer_mem is loaded,and i don't know the difference of grpc+verbs and grpc+gdr? anyone can help me? and some output is as below: root@s36-2288H-V5:~# /etc/init.d/nv_peer_mem…

tensorflow grpc-python distributed-tensorflow

asked Aug 24 '20 at 03:46

der Liu

11
1

1

vote

0 answers

Simple way to use a single GPU over IP in tensorflow

I have been searching the web up and down but can't seem to find a simple answer. Basically, I have a desktop with one GPU, and a laptop where my main code is at. My goal is to use distributed tensorflow to execute python code on my laptop while…

python tensorflow grpc distributed-tensorflow

asked Nov 15 '19 at 00:46

Binary

451
5
15

1

vote

1 answer

Distributed Keras MultiWorkerMirroredStrategy doesn't work with embedding_column converts from variable-length input feature

I am trying TensorFlow 2.0 and testing the distributed solution of keras, but I face a problems: embedding_column converts from variable-length input feature doesn't work with Distributed Keras MultiWorkerMirroredStrategy . With local…

tensorflow keras tensorflow2.0 tf.keras distributed-tensorflow

asked Oct 15 '19 at 04:05

FelixHo

1,254
14
26

1

vote

0 answers

Is TLS supported in Distributed Tensorflow gRPC communication

I was wondering is TLS supported in current distributed tensorflow with gRPC? I am reading through the code, https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/core/distributed_runtime/rpc/grpc_server_lib.h#L105 the implementation of…

tensorflow distributed-tensorflow

asked Sep 17 '19 at 18:08

JRH

53
5

1

vote

0 answers

How to create a custom distribution strategy on Tensorflow

I', looking to write a custom distribution strategy for tensorflow. Currently there are several types strategies available (MirroredStrategy, TPUStrategy, ...), but i would like to implement a new way of distributing training across several…

python tensorflow distributed-tensorflow

asked Sep 09 '19 at 11:02

Joao P

38
1
6

1

vote

1 answer

implementing mask-r-cnn with tensorflow-distributed

I'm training a mask-r-cnn network, which is built on tensorflow and keras. I'm searching for a way to reduce training time, so I thought implementing it with tensorflow-distributed. I've been working with mask-r-cnn for some time, but it seems what…

python linux tensorflow faster-rcnn distributed-tensorflow

asked Aug 29 '19 at 08:32

JavaNoobb

11
2

1

vote

0 answers

Distributed Tensorflow error: Check failed: DeviceNameUtils::ParseFullName(new_base, &parsed_name)

Trying to run a distributed tensorflow example on CPU from: https://github.com/tmulc18/Distributed-TensorFlow-Guide/blob/master/Distributed-Setup/dist_setup.py Commands to run the example can be found…

python-2.7 tensorflow keras arm64 distributed-tensorflow

asked Jun 19 '19 at 20:56

Ali Heydari

11
1

1

vote

0 answers

How to run multiprocessing python with distributed tensorflow on slurm

I want to run a multiprocessing distributed tensorflow program on slurm. The script should use python multiprocessing library to open up different sessions on different nodes in parallel. This approach works when testing using slurm interactive…

python python-multiprocessing slurm distributed-tensorflow

asked May 14 '19 at 01:26

YoussefShoeb

11
4

1

vote

0 answers

Distributed execution under eager mode using tensorflow

According to a recently published white paper and the RFC on GitHub, tensorflow eager currently supports distributed execution. It is mentioned that, similar to the graph mode, we can run an operation eagerly on a remote device by setting the device…

tensorflow eager-execution distributed-tensorflow

asked Mar 15 '19 at 04:06

Kevin Lee

11
1

Questions tagged [distributed-tensorflow]