I am currently working with Q learning and I have a dictionary Q[state, action] where each state can be anything i.e. string, number, list.. depending on the application. Each state has either 3 or 4 possible actions. For each state I need to find the action which has the highest Q value. The problem is I do not know how to access all possible actions that state has directly from the dictionary which has two keys, so I have tried to use for loop:
for statex, actionx in self.array:
if statex == state and (actionx != None):
y[actionx] = self.array[statex, actionx]
y.argMax()
Where argMax()
def argMax(self):
"""
Returns the key with the highest value.
"""
if len(self.keys()) == 0: return None
all = self.items()
values = [x[1] for x in all]
maxIndex = values.index(max(values))
return all[maxIndex][0]
The problem is that it takes too long to calculate. Any ideas how I can make it faster, probably by eliminating for loop?