I have been playing around with the MIT DeepTraffic Challenge Also watching the lecture and reading the slides
After getting a General understanding of the architecture I was wondering what exactly the reward function given by the Environment is.
- Is it the same as the Input of the gridcell (max. drivable Speed)?
- And are they using Reward Clipping, or not?
I also found this javascript Codebase, which does not really help my understanding either.