Like Q learning we have reward feedback does that mean the agent need to know in advance?
Asked
Active
Viewed 293 times
1 Answers
0
The agent need not have knowledge about the rewards function. But it should get a reward for every step taken. Note that we can have zero rewards until the episode ends. The term reward feedback means that there is some scalar value given for each transition.

nsidn98
- 1,037
- 1
- 9
- 23