1

How to setup tensorflow to work with LSF job scheduler? I have almost no experience with LSF. tf.train.ClusterSpec needs ip addresses of workers and parameter servers. Is it possible to obtain them from the LSF environment? Are there any success stories of making them work together?

EDIT:

Found some explanations how to achieve similar goal on Slurm cluster Running TensorFlow on a Slurm Cluster?. Basically, i'm looking for something like this but for LSF job scheduler

Alexander Reshytko
  • 2,126
  • 1
  • 20
  • 28

2 Answers2

1

There's a blog post and sample launch script for TensorFlow on LSF here.

Michael Closson
  • 902
  • 8
  • 13
0

You could do this on LSF, but I don't recommend it. What i would recommend is that if you can use Docker and go that route. LSF has a pile of other complications that can go wrong. Plus TensorFlow wasn't exactly designed to run on a system like LSF.

Docker Swarm and Compose have worked well in the past for me with this particular problem.

bR3nD4n
  • 111
  • 10