Loss does not decrease during training (Word2Vec, Gensim)

Question

What can cause loss from model.get_latest_training_loss() increase on each epoch?

Code, used for training:

class EpochSaver(CallbackAny2Vec):
    '''Callback to save model after each epoch and show training parameters '''

    def __init__(self, savedir):
        self.savedir = savedir
        self.epoch = 0

        os.makedirs(self.savedir, exist_ok=True)

    def on_epoch_end(self, model):
        savepath = os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch))
        model.save(savepath)
        print(
            "Epoch saved: {}".format(self.epoch + 1),
            "Start next epoch ... ", sep="\n"
            )
        if os.path.isfile(os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch - 1))):
            print("Previous model deleted ")
            os.remove(os.path.join(self.savedir, "model_neg{}_epoch.gz".format(self.epoch - 1))) 
        self.epoch += 1
        print("Model loss:", model.get_latest_training_loss())

    def train():

        ### Initialize model ###
        print("Start training Word2Vec model")

        workers = multiprocessing.cpu_count()/2

        model = Word2Vec(
            DocIter(),
            size=300, alpha=0.03, min_alpha=0.00025, iter=20,
            min_count=10, hs=0, negative=10, workers=workers,
            window=10, callbacks=[EpochSaver("./checkpoints")], 
            compute_loss=True
    )

Output:

Losses from epochs (1 to 20):

Model loss: 745896.8125
Model loss: 1403872.0
Model loss: 2022238.875
Model loss: 2552509.0
Model loss: 3065454.0
Model loss: 3549122.0
Model loss: 4096209.75
Model loss: 4615430.0
Model loss: 5103492.5
Model loss: 5570137.5
Model loss: 5955891.0
Model loss: 6395258.0
Model loss: 6845765.0
Model loss: 7260698.5
Model loss: 7712688.0
Model loss: 8144109.5
Model loss: 8542560.0
Model loss: 8903244.0
Model loss: 9280568.0
Model loss: 9676936.0

What am I doing wrong?

Language arabian. As input from DocIter - list with tokens.

Please, comment you downvoting! – Dasha Aug 28 '18 at 14:19 — Dasha, Aug 28 '18 at 14:19

gojomo · Accepted Answer · 2018-12-24T18:22:42.470

6

Up through gensim 3.6.0, the loss value reported may not be very sensible, only resetting the tally each call to train(), rather than each internal epoch. There are some fixes forthcoming in this issue:

https://github.com/RaRe-Technologies/gensim/pull/2135

In the meantime, the difference between the previous value, and the latest, may be more meaningful. In that case, your data suggest the 1st epoch had a total loss of 745896, while the last had (9676936-9280568=) 396,368 – which may indicate the kind of progress hoped-for.

edited Dec 24 '18 at 18:22

answered Aug 29 '18 at 00:40

gojomo

52,260
14
86
115

Thank you for your answer! But the loss is bigger on the last stage, and as far as I understood, I only take into account the difference and should interpret it like progress, shouldn't I? And can I receive a more appropriate loss, if I'll call train() instead of just passing "iter=smth" to my model? – Dasha Aug 29 '18 at 09:23
4

Calling `train()` multiple times, if you aren't already deeply familiar with the code's internal operations, often goes wrong... so I don't recommend that. (It's fragile & error-prone, & most online examples I see are wrong.) The model is generally *trying* to lower its loss after each training-example – but it may never get objectively very good at its internal word predictions, and doesn't need to for the word-vectors to still be useful for downstream tasks. (A model with lower loss doesn't necessarily give better word-vectors than one with higher!) – gojomo Aug 29 '18 at 19:05
2

And it's natural for the loss-through-a-full-epoch to bounce higher and lower for a while, then eventually stop improving, upon model "convergence". That means it's as roughly as good as a model of that complexity can get, for a certain training corpus, and further epochs will just jitter the overall loss a little up and down, but no longer reliably drive it lower. So you shouldn't worry too much about the last epoch-to-epoch delta. Why is loss of interest to you? – gojomo Aug 29 '18 at 19:07
1

(Separately, changing the default `alpha`/`min_alpha` isn't something I'd usually tinker with, unless sure of the reasons why and able to verify the changes are improving the results on downstream tasks.) – gojomo Aug 29 '18 at 19:08
@DashaOrgunova, did you try to use early stopping mechanism in callback function? – gaurav1207 Oct 03 '18 at 21:17
@gaurav1207 no, I did not! – Dasha Oct 04 '18 at 08:11
@gojomo, I've bumbed the gensim version, if you don't mind. (Wasn't able to edit the version number in-place) – DimG Dec 24 '18 at 14:36

lux7 · Answer 2 · 2020-11-18T14:07:12.880

As proposed by gojomo you can calculate the difference of loss in the callback function:

from gensim.models.callbacks import CallbackAny2Vec
from gensim.models import Word2Vec

# init callback class
class callback(CallbackAny2Vec):
    """
    Callback to print loss after each epoch
    """
    def __init__(self):
        self.epoch = 0

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        if self.epoch == 0:
            print('Loss after epoch {}: {}'.format(self.epoch, loss))
        else:
            print('Loss after epoch {}: {}'.format(self.epoch, loss- self.loss_previous_step))
        self.epoch += 1
        self.loss_previous_step = loss

For the training of your model and add computer_loss = True and callbacks=[callback()] in the word2vec train method:

# init word2vec class
w2v_model = Word2Vec(min_count=20, 
                     window=12 
                     size=100, 
                     workers=2)
# build vovab
w2v_model.build_vocab(sentences)
  
# train the w2v model
w2v_model.train(senteces, 
                total_examples=w2v_model.corpus_count, 
                epochs=10, 
                report_delay=1,
                compute_loss = True, # set compute_loss = True
                callbacks=[callback()]) # add the callback class

# save the word2vec model
w2v_model.save('word2vec.model')

This will output something like this:

Loss after epoch 0: 4448638.5

Loss after epoch 1: 3283735.5

Loss after epoch 2: 2826198.0

Loss after epoch 3: 2680974.0

Loss after epoch 4: 2601113.0

Loss after epoch 5: 2271333.0

Loss after epoch 6: 2052050.0

Loss after epoch 7: 2011768.0

Loss after epoch 8: 1927454.0

Loss after epoch 9: 1887798.0

Loss does not decrease during training (Word2Vec, Gensim)

2 Answers2

Linked