I simply want to choose the elements from legal_values that have the same index as the maximum value in q_values. So in the example below the max q_value is 2, which has index 1, therefore the corresponding element in legal actions is 1. If there are more than one maximum values I want all of the indices of the maximum values to be returned.
When I try the code below:
legal_actions = [0, 1]
q_values = [1,2]
legal_actions[np.isclose(q_values, np.max(q_values), 1e-8)]
I get the following error:
TypeError: only integer scalar arrays can be converted to a scalar index