I'm training a LLM(LLaMA-6B) and have noticed that its loss seems to drop in a stair-like fashion over the epochs. Specifically, I'll see little loss change for one epoch, and then suddenly the loss will drop quite a bit after a new epoch.
I'm curious about what might be causing this phenomenon. Is it something to do with the learning rate, or perhaps the architecture of the model itself? Any insights would be greatly appreciated! loss figure
I'm curious about what might be causing this phenomenon. Any insights would be greatly appreciated!