Reinforcement learning with new actions/expanding actionset

Question

I wonder if there is any research on RL problems with new actions, i.e. think of a video game, as the game goes by, the agent learns more skills/maneuvers and thus has more available actions to choose, and thus the action set is expanding over time. A related question

State dependent action set in reinforcement learning

But there is not sufficient answer to this question, either. Thanks!

score 0 · Answer 1 · answered Jul 25 '18 at 15:35

All of the recent research and papers in deep reinforcement learning use environments with a small, static set of potential actions. However, there are a couple of ways which you could try to compensate for having a variable action space.

Let's say we have a game environment where the agent can perform different attacks. One of the attacks, the fireball, is only unlocked later in the game. Maybe you have to do something special to unlock this attack, but for the purposes of this argument, let's just assume your agent will unlock this ability at some point in the course of the game.

You could add the unlocked actions to the action space and assign a large negative reward if the agent tries to take an action that have not yet unlocked. So if your agent tries to use the fireball and it has not been unlocked yet, they get a negative reward. However, this has a high likelihood of the agent "learning" to never use the fireball, even if it is unlocked.
You could also vary the action space by adding new actions as they become available. In this scenario, the agent would not have the fireball attack in their action space until it is unlocked. You would have to vary your epsilon (rate of random action) to do more exploration when new actions are added to the action space.
You could track the agent's available actions as part of the "state". If the agent has the ability to use a fireball in one part of the game, but not another part of the game, that could be considered a different state, which might inform the agent. The vector representing the state could have a binary value for each different unlockable ability, and combined with the approach mentioned above in #1, your agent could learn to use unlocked abilities effectively.

This research paper discusses reinforcement learning in continuous action spaces, which isn't quite the same thing but might give you some additional thoughts.

Reinforcement learning with new actions/expanding actionset

1 Answers1