For example, let's asume that Reward for an action was a negative reward, and Agent was very very well trained to avoid that action.
In this situation, if I change the reward for that action to a positive reward and continue with the previous training, would it be difficult to expect the positive reward for that action?
If so, would it be better to start training all over again?