tflearn loss is always 0.0 while training reinforcement learning agent

Question

I tried to train a reinforcement learning agent with gym and tflearn using this code:

from tflearn import *
import gym
import numpy as np

env = gym.make('CartPole-v0')
x = []
y = []
max_reward = 0

for i in range(1000):
    env.reset()
    while True:
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            break
        if reward >= max_reward:
            x.append(observation)
            y.append(np.array([action]))
x = np.asarray(x)
y = np.asarray(y)

net = input_data((None,4))
net = fully_connected(net,8,'softmax')
net = fully_connected(net,16,'softmax')
net = fully_connected(net,32,'softmax')
net = fully_connected(net,64,'softmax')
net = fully_connected(net,128,'softmax')
net = fully_connected(net,64,'softmax')
net = fully_connected(net,32,'softmax')
net = fully_connected(net,16,'softmax')
net = fully_connected(net,8,'softmax')
net = fully_connected(net,4,'softmax')
net = fully_connected(net,2,'softmax')
net = fully_connected(net,1)
net = regression(net,optimizer='adam',learning_rate=0.01,loss='categorical_crossentropy',batch_size=1)
model = DNN(net)

model.fit(x,y,10)
model.save('saved/model.tflearn')

The Problem is, when the model is training the loss is always 0.0. Can someone help me with this Issue?

Why do you have so many `softmax` layers?, did you mean to use `sigmoid` or `relu`? — Julio Daniel Reyes, Oct 22 '17 at 16:50
Actually that was intentional. But it's entirely possible that that's a mistake because i'm a total machine learning novice. — Kay Jersch, Oct 22 '17 at 20:59

score 0 · Answer 1 · answered Oct 23 '17 at 04:11

Not sure what is your objective but categorical_crossentropy is a loss function used for multiclass classification, but the output of your network is just one unit fully_connected(net,1) with a linear activation, that is why you are getting loss 0.

Try with mean_square or even binary_crossentropy and you will see different values of loss.

I would use a sigmoid activation on the last layer, and relus on the rest.

tflearn loss is always 0.0 while training reinforcement learning agent

1 Answers1