StableBaselines3 - Why does calling "model.learn(50,000)" twice not give same result as calling "model.learn(100,000)" once?

Question

I am working on a Reinforcement Learning problem in StableBaselines3.

I am trying to understand why this code:

model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256, clip_range=0.2)
model.learn(100000)

Does not give the exact same result as this code:

model = MaskablePPO(MaskableActorCriticPolicy, env, verbose=1, learning_rate=0.0003, gamma=0.975, seed=10, batch_size=256, clip_range=0.2)
model.learn(50000)
model.learn(50000)

I say they don't give the same results because in both cases, I tested out the model on a test-set through a for-loop, and the performance was different. Given that I set deterministic=True in the for-loop and I didn't change the seed, the different performance must mean the networks are different, which means the training process was different.

I was under the impression that if I run model.learn() on an existing model, it would just pick up the training where it was previously left off, but I guess that's incorrect.

Can someone help me understand why those two situations deliver different results?

StableBaselines3 - Why does calling "model.learn(50,000)" twice not give same result as calling "model.learn(100,000)" once?

0 Answers0