1

I'm newbie in RL and I'm learning stable_baselines3. I've create simple 2d game, where we want't to catch as many as possible falling apples. If we don't catch apple, apple disappears and we loose a point, else we gain 1 point. We can move only left or right. I thought that AI will learn faster when I give him raw data without CNN with PPO and MlpPolicy.

The problem is that I don't know how many apples will be in the game in every moment, only that there will max 10 of them. So I thought that I will create observation_space like this:

self.observation_space = Box(0, 1, (11, 2))

Where first element would be position of player, and rest positions of apples. If apple doesn't exists I would push value (0, 0). I trained it for 100000 steps, but it seems very stupid, and goes to left edge of screen. How can I improve it?

0 Answers0