0

I am working on a machine-learning script with tflearn and gym.

I am able to get one network working in my python-script but whenever I try to call my functions to build up a 2nd or 3rd network and train it with model.fit, I get a

tensorflow.python.framework.errors_impl.InvalidArgumentError

edit; The goal should be to build up several different networks in order to compare them. First this should be only focused on the input_data and number of training epochs, but in the end, I'd like to compare different networksizes. Additionally I'd like to run it in a loop, building up more than two networks.

The following code reproduces my error:

  • initial_population(pop_size)

creates an array of random actions, size of pop_size

  • neural_network_model(input_size):

creates a neural network

  • train_model(training_data)

creates a new model, if none is passed, and trains the model on the provided training data

import gym
import random
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

LR = 1e-3
env = gym.make('CartPole-v0')
env.reset()
goal_steps = 500
score_requirement = 1


def initial_population(pop_size):

    training_data = []
    scores = []
    accepted_scores = []
    for _ in range(pop_size):
        score = 0
        game_memory = []
        prev_observation = []
        for _ in range(goal_steps):
            action = random.randrange(0,2)
            observation, reward, done, info = env.step(action)
            if len(prev_observation) > 0:
                game_memory.append([prev_observation, action])
            prev_observation = observation
            score += reward
            if done:
                break
        if score >= score_requirement:
            accepted_scores.append(score)
            for data in game_memory:
                if data[1] == 1:
                    output = [0,1]
                elif data[1] == 0:
                    output = [1,0]
                training_data.append([data[0], output])
        env.reset()
        scores.append(score)
    return np.array(training_data)


def neural_network_model(input_size):

    network = input_data(shape=[None, input_size, 1], name='input')
    network = fully_connected(network, 128, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 2, activation='softmax')
    network = regression(network, optimizer='adam', learning_rate=LR,
                         loss='categorical_crossentropy', name='targets')
    model = tflearn.DNN(network, tensorboard_dir='log')
    return model


def train_model(training_data, model=False, n_training_epochs=5):

    X = np.array([i[0] for i in training_data]).reshape(-1, len(training_data[0][0]), 1)
    Y = [i[1] for i in training_data]
    if not model:
        model = neural_network_model(input_size = len(X[0]))
    model.fit({'input':X}, {'targets':Y}, n_epoch=n_training_epochs, snapshot_step=500, show_metric=True)
    return model


if __name__ == "__main__":

    training_data = initial_population(5)
    print("still alive 1")
    model = train_model(training_data, n_training_epochs=1)
    print("still alive 2")
    training_data = initial_population(1)
    print("still alive 3")
    model = train_model(training_data, n_training_epochs=1)
    print("still alive 4")

With the output:

C:\Users\username\AppData\Local\Programs\Python\Python36\python.exe C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
still alive 1
2017-11-21 01:03:45.096492: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2017-11-21 01:03:45.355914: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.228
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2017-11-21 01:03:45.356242: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
2017-11-21 01:03:46.394283: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
---------------------------------
Run id: BCIV9S
Log directory: log/
---------------------------------
Training samples: 137
Validation samples: 0
--
Training Step: 1  | time: 0.224s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 064/137
Training Step: 2  | total loss: 0.62389 | time: 0.234s
| Adam | epoch: 001 | loss: 0.62389 - acc: 0.4500 -- iter: 128/137
Training Step: 3  | total loss: 0.68097 | time: 0.239s
| Adam | epoch: 001 | loss: 0.68097 - acc: 0.3631 -- iter: 137/137
--
still alive 2
still alive 3
2017-11-21 01:03:47.234643: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
2017-11-21 01:03:48.302791: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
---------------------------------
Run id: HHBWWQ
Log directory: log/
---------------------------------
Training samples: 20
Validation samples: 0
--
2017-11-21 01:03:49.928408: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'input_1/X' with dtype float and shape [?,4,1]
     [[Node: input_1/X = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2017-11-21 01:03:49.928684: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'input_1/X' with dtype float and shape [?,4,1]
     [[Node: input_1/X = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
    status, run_metadata)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_1/X' with dtype float and shape [?,4,1]
     [[Node: input_1/X = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
     [[Node: Dropout_1/cond/Merge/_119 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_274_Dropout_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py", line 69, in <module>
    model = train_model(training_data, n_training_epochs=1)
  File "C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py", line 58, in train_model
    model.fit({'input':X}, {'targets':Y}, n_epoch=n_training_epochs, snapshot_step=500, show_metric=True)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\models\dnn.py", line 216, in fit
    callbacks=callbacks)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 339, in fit
    show_metric)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\helpers\trainer.py", line 818, in _train
    feed_batch)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_1/X' with dtype float and shape [?,4,1]
     [[Node: input_1/X = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
     [[Node: Dropout_1/cond/Merge/_119 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_274_Dropout_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'input_1/X', defined at:
  File "C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py", line 69, in <module>
    model = train_model(training_data, n_training_epochs=1)
  File "C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py", line 57, in train_model
    model = neural_network_model(input_size = len(X[0]))
  File "C:/Users/username/.PyCharm2017.1/config/scratches/scratch.py", line 44, in neural_network_model
    network = input_data(shape=[None, input_size, 1], name='input')
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\layers\core.py", line 81, in input_data
    placeholder = tf.placeholder(shape=shape, dtype=dtype, name="X")
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1599, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3090, in _placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "C:\Users\username\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1/X' with dtype float and shape [?,4,1]
     [[Node: input_1/X = Placeholder[dtype=DT_FLOAT, shape=[?,4,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
     [[Node: Dropout_1/cond/Merge/_119 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_274_Dropout_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


Process finished with exit code 1

The critical part seems to be, that the function model.fit doesn't get the right datatype, the second time it gets called. It looks like that both instances might share some variables, data, etc., which screws something up.

For the regular tensorflow, I've seen that you might have to do a seperate session for every new model, but I don't know if that applies to the tflearn package.

I am working on Windows 10 and Python 3.6.

1 Answers1

1

One way to get this to work is by changing the second call to train_model to train_model(training_data, model, n_training_epochs=1), so that it reuses the model it created in the first call. This doesn't seem to be quite what you want, since you mention trying to build up a second network.

Creating a second model in the same session does seem to cause issues, but you can create a single model and save it using model.save, and then run your program again and save another model to a different file.

From your question it's not entirely clear what you're trying to accomplish, so I'm not sure if either of these will work for you.

Edit: Okay, I think I've figured out how to do what you want. If you don't specify which graph you want to use then TensorFlow puts everything into the default graph. You can specify that you want things to be in separate graphs as follows:

import tensorflow as tf  # This can be at the top of the file if you prefer
graph1 = tf.Graph()
with graph1.as_default():
    training_data = initial_population(5)
    print("still alive 1")
    model = train_model(training_data, n_training_epochs=1)
    print("still alive 2")

graph2 = tf.Graph()
with graph2.as_default():
    training_data = initial_population(1)
    print("still alive 3")
    model = train_model(training_data, n_training_epochs=1)
    print("still alive 4")
Stephen
  • 824
  • 1
  • 8
  • 16
  • I added my goals to the post, so it should be a little clearer. Running the script multiple times with different command-line arguments for example wouldn't be a problem but I'm still curious if it works in one script. - Your given solution, calling `train_model(data, **model**, 1)` works, but it unfortunately doesn't erase what the network has already learned, so isn't suitable. – Niclas Eich Nov 21 '17 at 10:51