I'm experimenting with deep q learning using Keras
, and i want to teach an agent to perform a task .
in my problem i wan't to teach an agent to avoid hitting objects in it's path by changing it's speed (accelerate or decelerate)
the agent is moving horizontally and the objects to avoid are moving vertically and i wan't him to learn to change it's speed in a way to avoid hitting them . i based my code on this : Keras-FlappyBird
i tried 3 different models (i'm not using convolution network)
model with 10 dense hidden layer with sigmoid activation function , with 400 output node
model with 10 dense hidden layer with
Leaky ReLU
activation function- model with 10 dense hidden layer with
ReLu
activation function, with 400 output node
and i feed to the network the coordinates and speeds of all the object in my word to the network .
and trained it for 1 million frame but still can't see any result here is my q-value plot for the 3 models ,
Model 1 : q-value
Model 2 : q-value
as you can see the q values isn't improving at all same as fro the reward ... please help me what i'm i doing wrong ..