Why does LLM(LLaMA) loss drop staircase-like over epochs?

Question

I'm training a LLM(LLaMA-6B) and have noticed that its loss seems to drop in a stair-like fashion over the epochs. Specifically, I'll see little loss change for one epoch, and then suddenly the loss will drop quite a bit after a new epoch.

I'm curious about what might be causing this phenomenon. Is it something to do with the learning rate, or perhaps the architecture of the model itself? Any insights would be greatly appreciated! loss figure

I'm curious about what might be causing this phenomenon. Any insights would be greatly appreciated!

score -1 · Answer 1 · answered Apr 20 '23 at 04:07

-1

Tricky to answer without seeing your code. However, my guess would be, at the beginning of the 2nd epoch, the model starts seeing the same data again, and memorization begins, so the loss takes a big dip at the beginning of each epoch.

answered Apr 20 '23 at 04:07

Agneet Chatterjee

1
1

Why does LLM(LLaMA) loss drop staircase-like over epochs?

1 Answers1