1

I am trying to make an AI to play simple Snake game following this article. With my current solution snake is able to come up to score of around 35 points. For some reason it will handle the walls correctly but would almost every time die by hitting its own tail. I am not able to pinpoint my mistake, but I assume its the way I use fit function on the neural network.

for (int i = 0; i < EPOCHS; i++) {
        epsilon = 0.9;

        board.initGame();
        State state = board.getState();

        while (state.isInGame()) {
            // Get next action to perform
            final Action action = epsilonGreedyAction(state);

            // Play action and get reward
            final double reward = board.getRewardForAction(state, action);

            // Get next state
            final State nextState = board.getState();

            // Update neural network
            update(state, action, reward, nextState);

            // Apply next state
            state = nextState;
        }
    }

private Action epsilonGreedyAction(final State state) {
    epsilon -=0.001;

    double random = SnakeUtil.getRandom();
    if (random < epsilon) {
        randomActions++;
        return Action.randomAction(state);
    }

    return SnakeUtil.getMaxQAction(state, Q_TABLE);
}

getMaxQAction will return any action that is in the Q_TABLE for this state that has had high reward up until now (max reward). Q_TABLE is updated also in the update method

private void update(final State state, final Action action, final double reward, final State nextState) {
    MaxQ maxQ = SnakeUtil.getMaxQ(nextState, Q_TABLE);

    double targetReward = reward + (0.90 * maxQ.getReward());
    SnakeUtil.updateQTable(state, action, targetReward, Q_TABLE);

    net.fit(buildObservation(state).getData(), Nd4j.create(fromAction(action)));
}

private double[][] fromAction(final Action action) {
    return new double[][] {
            {
                    action.equals(Action.UP) ? 1 : 0,
                    action.equals(Action.RIGHT) ? 1 : 0,
                    action.equals(Action.DOWN) ? 1 : 0,
                    action.equals(Action.LEFT) ? 1 : 0
            }
    };
}

private Observation buildObservation(final State state) {
    return new Observation(Nd4j.create(new boolean[][]{
            {
                    state.isCanGoUp(),
                    state.isCanGoRight(),
                    state.isCanGoDown(),
                    state.isCanGoLeft(),
                    state.isFoodUp(),
                    state.isFoodUpRight(),
                    state.isFoodRight(),
                    state.isFoodDownRight(),
                    state.isFoodDown(),
                    state.isFoodDownLeft(),
                    state.isFoodLeft(),
                    state.isFoodUpLeft()
            }
    }));
}

My question is if the fit method is receiving correct parameters. If yes then my problem must be somwhere else.

In addition reward is calculated in a way that every time when snake steps in direction of the food its rewarded with 1, otherwise it's punished with -1.5. If it eats the food it's rewarded with 30. Any hints and suggestions are welcome.

mirzak
  • 1,043
  • 4
  • 15
  • 30
  • https://github.com/liliumbosniacum/snakedl4j In case if somebody is interested here is my full code. Biggest score the snake has made is 89. – mirzak Sep 28 '20 at 14:49

0 Answers0