I followed the following tutorial to implement the taxi domain and DQN.
However, when predicting values for a batch, all inputs get the same value.
Assume that the input for the embedding layer has the form [float]
in the interval [0, 1] and can assume 500 possible values. The batch has a size 16, so a batch
can be, for instance, like this:
[[0.206]
[0.816]
[0.768]
[0.046]
[0.902]
[0.384]
[0.302]
[0.984]
[0.588]
[0.524]
[0.164]
[0.102]
[0.606]
[0.224]
[0.728]
[0.566]]
However, when using model.predict_on_batch(batch)
all the predictions assume the same value:
[[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]
[-0.01847944 -0.04587542 -0.01173695 -0.04059657 0.00310457 0.01856036]]
And this is the network architecture:
model = Sequential()
model.add(Embedding(500, 10, input_length=1))
model.add(Reshape((10,)))
model.add(Dense(32, input_shape=(1,), activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(6, activation='linear'))
model.compile(loss='mse', optimizer=Adam(learning_rate=LEARNING_RATE))
Why the inputs are not assuming different predicted values?