1

I'm playing around with making a self driving car in a pc game. I was thinking of using reinforcement learning, and giving the car a location on the map to get to. The reward would be a function of the distance from the waypoint, and something very negative if the car crashes.

I can't really wrap my head around how to add the waypoint in to the system though. I'm using the camera input from the car as the input to the model, and I can calculate the reward based on its current position and the waypoint... but I don't always want the car to drive to the same spot... I want to give it a waypoint and have it drive there without crashing into anything.

How do I incorporate the waypoint and current position into the state / model?

crizCraig
  • 8,487
  • 6
  • 54
  • 53
DaveS
  • 105
  • 1
  • 1
  • 8

1 Answers1

2

Collision prevention

To prevent the car from crashing, you need to incentivize the agent to take actions at every step that avoid a collision. This is possible by having your reward function incorporate things like lane deviation and high g-force penalty, along with positive rewards for reaching closer to the waypoint.

State parameters

One way to think about state is - a set of parameters that can be used to pick an action which maximizes (discounted cumulative) reward. The waypoint and current position are not very informative in this regard, considering there is no optimal action to choose given just your current location and destination. The current optimal action depends on factors like speed, acceleration, throttle, distance to lane center, etc. You'd be better off recording these as your state parameters.

Take a look at the environment used by DeepDrive, a platform for testing self-driving car simulations. Note how it incorporates collision avoidance, minimizing destination distance, and maximizing adherence to the road in its reward function, and the choice of state parameters.

  • So where I'm getting stuck is "along with positive rewards for reaching closer to the waypoint". Yes, I can calculate the distance from the current location to the waypoint and use that as a reward... but when I take the model out of the training environment and actually use it, how do I tell it "go to this location" I don't really care if it takes an optimal path, I'd be happy with it just trying to minimize the distance and take turns it think will get it there. – DaveS Feb 13 '19 at 15:29
  • Right now when I calculate the distance to the waypoint and use that as a reward, it is training specifically for that path, so it wouldn't generalize if the waypoint changed... it seems to me that just giving it a reward for the distance isn't enough, because it isn't really trying to get to a general waypoint, the point it is trying to get to is static. It doesn't actually know it is trying to get there, it just knows going in that direction is good. – DaveS Feb 13 '19 at 16:53
  • Would the state space be image width X image height X image depth X current location X destination? – DaveS Feb 13 '19 at 20:09
  • That would lead to a very large state-space. Instead of giving rewards for distance towards waypoint, you can try using progress along the route. In many cases, the game itself generates a route to follow when you set a waypoint, you just need to track if your car followed it, and how much distance it traveled. This is a more general way of making the car reach the waypoint, and it will also learn to reach it safely – Dwait Bhatt Feb 15 '19 at 06:42
  • Also, the car will try and take turns to reach the waypoint instead of trying to move along a straight line towards it because of the provided route as well as the collision detection factors in the reward. This is how "positive rewards for reaching closer to the waypoint" were given in the DeepDrive link I posted as well. – Dwait Bhatt Feb 15 '19 at 06:52
  • Hmmm... that would mean there has to be a route planned though, right? My waypoint is really just coordinates, there is no way (at least right now, or that I can easily think of) for it to plan a route on the map. Maybe I need to rethink all of this – DaveS Feb 15 '19 at 16:57
  • 1
    Author of Deepdrive here. Essentially you want rewards to be speed based (progress over time towards the waypoint). Generalization to other way points will come in the form of staying within the lane (with high speed) so long as your state space includes some way to determine where the ego is with respect to the lane. If you want to decide on a turn, the route info has to be in the input as well (i.e. next turn is left). After the car learns to move within the lane, you can start tuning the reward to include comfort (i.e. minimize g-forces). – crizCraig Apr 22 '19 at 23:10
  • Adding obstacle info to the input would maneuvering out of the lane to pass a double parked car say. – crizCraig Apr 22 '19 at 23:10