2

I want to train "standford chatbot" from here https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot on GPU, but it doesn't use my GPU, but all need libraries (CuNN, CUDA, tensorflow-gpu etc.) are installed I tried:

def train():
""" Train the bot """

test_buckets, data_buckets, train_buckets_scale = _get_buckets()
# in train mode, we need to create the backward path, so forwrad_only is False

model = ChatBotModel(False, config.BATCH_SIZE)
model.build_graph()

saver = tf.train.Saver(var_list=tf.trainable_variables())

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) as sess:
    print('Start training')

    sess.run(tf.global_variables_initializer())
    _check_restore_parameters(sess, saver)

    iteration = model.global_step.eval()
    total_loss = 0
    while True:

        skip_step = _get_skip_step(iteration)
        bucket_id = _get_random_bucket(train_buckets_scale)
        encoder_inputs, decoder_inputs, decoder_masks = data.get_batch(data_buckets[bucket_id], 
                                                                       bucket_id,
                                                                       batch_size=config.BATCH_SIZE)
        start = time.time()
        _, step_loss, _ = run_step(sess, model, encoder_inputs, decoder_inputs, decoder_masks, bucket_id, False)
        total_loss += step_loss
        iteration += 1

        if iteration % skip_step == 0:
            print('Итерация {}: потеря {}, время {}'.format(iteration, total_loss/skip_step, time.time() - start))
            start = time.time()
            total_loss = 0
            saver.save(sess, os.path.join(config.CPT_PATH, 'chatbot'), global_step=model.global_step)
            if iteration % (10 * skip_step) == 0:
                # Run evals on development set and print their loss
                _eval_test_set(sess, model, test_buckets)
                start = time.time()
            sys.stdout.flush()

But It always show:

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'save/Const': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

Colocation Debug Info: Colocation group had the following types and devices: Const: CPU Identity: CPU [[Node: save/Const = Constdtype=DT_STRING, value=Tensor, _device="/device:GPU:0"]]

Are there some configuration file for tensorflow where I can specify to use only GPU or some another way (i tried "with tf.device("/gpu:0"):" and device_count={'GPU': 1}) )

1 Answers1

1

From your error:

Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

That means that the 'save/Const' operation cannot be forcefully assigned to a GPU via with tf.device(): because there is no GPU implementation for it. Remove the with tf.device(): part (or put that operation outside of it) and let TF decide where to put operations (it will prefer GPU over CPU anyhow)

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • I did it, but tensorflow chooses a CPU instead of GPU – Никита Иванюшкин Oct 09 '17 at 10:40
  • Considering that you didn't show how your model looks like, I can't help with the specifics of why the operations end up on the CPU. Some ops can only be on the CPU (like the saving one that throws your error), while most of the computation ones have both CPU and GPU implementations. What does the output look like if you enable device placement logging? – GPhilo Oct 09 '17 at 10:44
  • Everything that is on `/job:localhost/replica:0/task:0/gpu:0` is assigned (and run) on your GPU. – GPhilo Oct 09 '17 at 10:58
  • The output ends in `encoder2: (Placeholder): /job:localhost/replica:0/task:0/cpu:0 encoder1: (Placeholder): /job:localhost/replica:0/task:0/cpu:0 training/GradientDescent/value: (Const): /job:localhost/replica:0/task:0/cpu:0` And the training goes very slowly – Никита Иванюшкин Oct 09 '17 at 11:01
  • The author of this chatbot mentioned that he trained it on GPU but he didn't explain how, I tried to do this. Here is the article http://web.stanford.edu/class/cs20si/assignments/a3.pdf – Никита Иванюшкин Oct 09 '17 at 11:05
  • From the (deleted) comments you posted, I can see that you do indeed have operations on the GPU. Note however that not **all** operations can be placed on the GPU. The log tells you where every operation in the graph is placed, so it's not relevant how it ends, those are just the latest operations performed in the graph. As per why the training is slow, that can depend on a million things, but GPU not being used is not your case (although it is possible that the GPU is being used not to its full capability. Optimizing tensorflow code is not simple) – GPhilo Oct 09 '17 at 11:07