About reward policy in a DQN model

Question

I’m wondering about the reward policy in a DQN model. I’m learning how to use DQN for solving cases. So, I’m applying DQN in a deterministic case that I know already the answer.

I’m developing a DQN model that finds the optimal threshold to obtain the maximum metric in a classification ML model, for example, find the best threshold that maximize F1 Score. In this example, my states are any value in range (0,1) and my actions are decrease or increase 0.01 in each state.

So, I tried several ways to set the reward policy and I found a new one in terms of the metric that I want to maximize. For example, if the F1 Score, in the next state, is greater than the F1 score in the current state, the reward is 1.

My main question is that if this kind of approach of computing rewards is optimal or correct? I’m thinking I could be violating any principle of DQN models by computing rewards in terms of next and current states.

score 0 · Answer 1 · edited Apr 21 '23 at 13:53

The answer to your question is more general: yes, as long as you're satisfied with the agent's performance. No way other than trial-and-error exists. If you're wondering about theoretical correctness, then also yes, your approach is legit.

However, I would personally change it a bit. Your state could be extended with the current threshold value and the current ML score, e.g., F1. By doing so you give your agent a full representation of the environment.

As for the reward function, it should be more flexible in my opinion. Generally the idea is to give a positive reward if the performance of the ML model improves due to the change in the threshold value, and a negative reward if the performance of the ML model degrades. The value of your reward should be however proportional to the improvement or degradation in the performance, and could be normalized to facilitate convergence and stability in the training process.

About reward policy in a DQN model

1 Answers1