0

I am working on a car following problem and the measurements I am receiving are uncertain ( I know that the noise model is gaussian and it's variance is also known). How do I select my next action in such kind of uncertainty?

Basically how should I change my cost function so that I can optimize my plan by selecting appropriate action?

devil in the detail
  • 2,905
  • 17
  • 15

1 Answers1

1

Vanilla reinforcement learning is meant for Markov decision processes, where it's assumed that you can fully observe the state. Because your states are noisy, you have a Partially observable Markov decision process. Theoretically speaking you should be looking at a different category of RL approaches.

Practically, since you have so much information about the parameters of the uncertainty, you should consider using a Kalman or particle filter to perform state estimation. Then, use the most likely state estimate as the true state in your RL problem. The estimate will be wrong at times, of course, but if you're using a function approximation approach for the value function, the experience can generalize across similar states and you'll be able to learn. The learning performance is going to be proportional to the quality of your state estimate.

Nick Walker
  • 790
  • 6
  • 19
  • I can't use kalman filter. Because Let's say ego vehicle is behind the other vehicle but the measurement it received say that it has overtaken other vehicle (due to uncertainty, larger variance will spread measurements to larger region). In this case, Ego vehicle won't be able to follow another vehicle. I guess this is very complicated but how can we say which measurement is good to select? – devil in the detail Mar 15 '17 at 05:20
  • Inference based state estimates are based on multiple readings across time, and a model of how the vehicles actually move (your priors). This means that even if a reading suggests that you have passed the target vehicle, your history of readings and your prior belief that the vehicles will not move much in one time step makes this state unlikely. A single spurious measurement will not be disruptive. For reference, you may want to look at robot localization where state estimation approaches are used with much success. – Nick Walker Mar 15 '17 at 05:39
  • This sounds great. Let me try, I'll post the outcome. I can try information vector in place of state vector. What say? – devil in the detail Mar 15 '17 at 06:29
  • Information vector is the set of all previous measurements and actions. This is used when we have incomplete information about state. – devil in the detail Mar 16 '17 at 06:27
  • You could do that, but that would take the scale of the state space from O(S) to O(t^S). For any reasonable t, this is intractable. This would be the best that could be done if there were no prior knowledge, but that isn't the case in this problem. – Nick Walker Mar 16 '17 at 06:39
  • How? My state space is going to be same. I am just increasing number of variables in my state vector, as far as we are concerned with finding next state of the vehicle this information can be extrapolated. – devil in the detail Mar 16 '17 at 06:43
  • The size of the state space is the number of unique combinations you can make with the variables you use the represent it. If you include past observations in your state vector, you are increasing the size of the state space (relative to the baseline one observation == state). Say we could describe the state space as a single integer in the range [0,9]. Let's call that S. Clearly there are 10 states. Now say we want to include a previous state as well. We make a new state space consisting of two elements of S, so we have |S| choose 2 possibilities; now there are 45 possible states. – Nick Walker Mar 16 '17 at 07:23
  • Yes. I agree but each time the observation will be derived from the state space only. If S=[0,9] and O1 = 2, O2=3, O3=5 and so on. In this there are many measurements but from the same state space. State space is the set of all possible states and information vector is not a state, it is a combination of states. We won't be defining information space. – devil in the detail Mar 16 '17 at 07:28
  • "I can try information vector in place of state vector." I took this to mean that you want to use the information vector in place of the state vector during learning. If this is the case, then the information vector becomes a state vector, and the size of the information space becomes the size of the state space for this new learning problem. Were you suggesting something different? – Nick Walker Mar 16 '17 at 07:40
  • Yes I was suggesting keeping an information vector (I_t) but the state at time t will be represented by s_t only. – devil in the detail Mar 16 '17 at 09:25
  • Please edit your question with your idea of how this information vector would help you learn and I'll amend my answer – Nick Walker Mar 16 '17 at 17:31