Currently I use tensorflow slim to train the model from scrach. If I just follow the instruction here https://github.com/tensorflow/models/tree/master/slim#training-a-model-from-scratch, everything is OK.
However, I want to use multi GPU, so I set --num_clones=2 or 4, both of them are not working. The result is that both of them get stuck at global_step/sec: 0. They can't continue. You can see the result image here error result
DATASET_DIR=/tmp/imagenet
TRAIN_DIR=/tmp/train_logs
python train_image_classifier.py \
--num_clones=4 \
--train_dir=${TRAIN_DIR} \
--dataset_name=imagenet \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v3
Hope someone can help me, thanks in advance. By the way, I use tensorflow 1.1 & python 35 on Ubuntu 16.04. If you need more information, please let me know.