My problem is the following. I have a simple grid world:
https://i.stack.imgur.com/xrhJw.png
The agent starts at the initial state labeled with START, and the goal is to reach the terminal state labeled with END. But, the agent has to avoid the barriers labeled with X and before reaching the END state it has to collect all items labeled with F. I implemented it by using Q-Learning and Sarsa as well, and the agent reaches the END state and avoid the barriers (X states). So this part works well.
My question is, how can I make agent to collect all the items (F states) before reaches END state? By using Q-Learning or Sarsa it avoids the barriers, reaches the END state but does not collect all the items. Usually one F state is visited and after the agent heading to the END state.
Thank you for your help!