1

I am applying Q-learning with function approximation to a problem where each state doesn't have same set of actions. There when i am calculating target

Target = R(s,a,s') + (max_a' * Q(s',a'))

As each state does not have same set of actions so should i include set of actions also inside my state definition?. otherwise what's happening is that two state may be very similar to each other in all other feature except the fact that they have very different set of actions available from there onward. Even if i include set of actions then problem is length of vector because each state has different number of actions. please help me.

Prabir
  • 75
  • 8

1 Answers1

0

My suggestion would be to express actions as weighted sums of features.

For example, if you are using a neural network, your input layer would be states and your output layer would be features of actions. You could compute Q(s,a) as sum(NN(s)_i * a_i), where NN(s)_i is the value of the ith output neuron of the neural net given input s, and a_i is the weight given to feature i by action a.

This could also be interpreted as having a single neural network that has predetermined weights at the last layer, which is different for every input. This is conceptually very messy but easy to program.

LYH
  • 78
  • 8
  • sorry, i did not understand your solution approach. What i am doing is each state action pair is a set of features which is used in input layer of neural net and target T = immediate reward + max payoff from next state onward. but since max payoff from next state onward will vary because different set of actions available from next state onward. now how to capture this in current state without including actions available from next state onward as feature in current state and action pair feature vector? – Prabir Aug 26 '16 at 18:58