1

I am trying to train a neural network to efficiently explore a grid to locate an object using Keras and Keras-RL. Every "step", the agent chooses a direction to explore by choosing a number from 0 to 8, where each corresponds to a cardinal or intermediate direction.

(Using reinforcement learning for this simple task is clearly not the best choice, as a simple algorithm could easily scan back and forth to achieve the goal. However, this serves more as a "tech demo" and challenge to myself.)

The following diagram represents all possible choices. 0 indicates northwest, 1 indicates north, 2 indicates northeast, etc. Note that 4 represents the choice to stay stationary.

0 1 2
3 4 5
6 7 8

The observation function returns the explored states of tiles within a certain radius of "vision" (.flatten()ed), and the reward function simply returns the number of unexplored grid tiles within this radius.

In the following diagram, which uses a radius of 2, represents an explored tile, represents a tile within the radius of vision, o represents the explorer, x represents the desired object, and represents an entirely unexplored tile.

+--------+
|████████|
|██■■■█  |
|██■o■   |
|  ■■■   |
|       x|
+--------+

I am using the following model. During my experimentation, I typically use a 20x20 grid, with varying numbers of 16-node Dense layers (arbitrarily chosen).

model = Sequential()
model.add(LSTM(2, input_shape=(1,) + observation_shape))
# observation_shape is the number of tiles being returned by observation

for _ in range(nb_dense):
    model.add(Dense(dense_output))
# nb_dense and dense_output are varied manually, for testing purposes

model.add(Dense(nb_actions))  # output shape = 9 (number of directions)
model.add(Activation("linear"))
print(model.summary())

memory = SequentialMemory(limit=50000, window_length=1)
policy = EpsGreedyQPolicy(eps=.1)
dqn = DQNAgent(
    model=model,
    nb_actions=nb_actions,
    memory=memory,
    nb_steps_warmup=10,
    target_model_update=1e-2,
    policy=policy
)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

dqn.fit(env, nb_steps=50000, visualize=False, verbose=2)

dqn.test(env, nb_episodes=5, visualize=True)

Unfortunately, even after much testing, the agent is still unable to find the object within any stretch of a reasonable amount of time, if at all.

  • Is there something inherently wrong with my layer setup? (Should I use more/less Dense, LSTM, etc.?)
  • Are my SequentialMemory, EpsGreedyQPolicy, DQNAgent, or.compile() values non-ideal for the situation?
  • Is exploration itself too complex of a problem for such a simple network to solve?

In general, how could I improve the network so that it would actually succeed in exploration, finding the object in a relatively short amount of time regardless of its placement?

Harrison Grodin
  • 2,253
  • 2
  • 19
  • 30

0 Answers0