I am applying Q-learning with function approximation to a problem where each state doesn't have same set of actions. There when i am calculating target
Target = R(s,a,s') + (max_a' * Q(s',a'))
As each state does not have same set of actions so should i include set of actions also inside my state definition?. otherwise what's happening is that two state may be very similar to each other in all other feature except the fact that they have very different set of actions available from there onward. Even if i include set of actions then problem is length of vector because each state has different number of actions. please help me.