I have an RL problem where I want the agent to make a selection of x out of an array of size n.
I.e. if I have [0, 1, 2, 3, 4, 5]
then n = 6
and if x = 3
a valid action could be
[2, 3, 5]
.
Right now what I tried is have n
scores:
Output n
continuous numbers, and select the x
highest ones. This works quite ok.
And I tried iteratively replacing duplicates out of a Multi Discrete action. Where we have x
values that can be anything from 0 to n-1
.
Is there some other optimal action space I am missing that would force the agent to make unique choices?
Many thanks for your valuable insights and tips in advance! I am happy to try all!