All of the recent research and papers in deep reinforcement learning use environments with a small, static set of potential actions. However, there are a couple of ways which you could try to compensate for having a variable action space.
Let's say we have a game environment where the agent can perform different attacks. One of the attacks, the fireball, is only unlocked later in the game. Maybe you have to do something special to unlock this attack, but for the purposes of this argument, let's just assume your agent will unlock this ability at some point in the course of the game.
- You could add the unlocked actions to the action space and assign a
large negative reward if the agent tries to take an action that have
not yet unlocked. So if your agent tries to use the fireball and it
has not been unlocked yet, they get a negative reward. However, this
has a high likelihood of the agent "learning" to never use the
fireball, even if it is unlocked.
- You could also vary the action space by adding new actions as they
become available. In this scenario, the agent would not have the
fireball attack in their action space until it is unlocked. You would
have to vary your epsilon (rate of random action) to do more
exploration when new actions are added to the action space.
- You could track the agent's available actions as part of the "state".
If the agent has the ability to use a fireball in one part of the
game, but not another part of the game, that could be considered a
different state, which might inform the agent. The vector representing the state could have a binary value for each different unlockable ability, and combined with the approach mentioned above in #1, your agent could learn to use unlocked abilities effectively.
This research paper discusses reinforcement learning in continuous action spaces, which isn't quite the same thing but might give you some additional thoughts.