1

I'm training a LLM(LLaMA-6B) and have noticed that its loss seems to drop in a stair-like fashion over the epochs. Specifically, I'll see little loss change for one epoch, and then suddenly the loss will drop quite a bit after a new epoch.

I'm curious about what might be causing this phenomenon. Is it something to do with the learning rate, or perhaps the architecture of the model itself? Any insights would be greatly appreciated! loss figure

I'm curious about what might be causing this phenomenon. Any insights would be greatly appreciated!

Jing zhao
  • 11
  • 1

1 Answers1

-1

Tricky to answer without seeing your code. However, my guess would be, at the beginning of the 2nd epoch, the model starts seeing the same data again, and memorization begins, so the loss takes a big dip at the beginning of each epoch.