gensim - Word2vec continue training on existing model - AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

Question

I am trying to continue training on an existing model,

model = gensim.models.Word2Vec.load('model/corpus.zhwiki.word.model')
more_sentences = [['Advanced', 'users', 'can', 'load', 'a', 'model', 'and', 'continue', 'training', 'it', 'with', 'more', 'sentences']]    
model.build_vocab(more_sentences, update=True)
model.train(more_sentences, total_examples=model.corpus_count, epochs=model.iter)

but I got an error with the last line:

AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

Some posts said it's caused by using a earlier version of gensim, and I have tried to add this after loading the existing model and before train().

model.compute_loss = False

After that, it didn't give me the AttributeError, but the output of model.train() is 0, and model didn't trained with new sentences.

How to solve this problem?

score 7 · Answer 1 · answered Mar 09 '18 at 02:46

Here is how I continues training my model

# training_data: initial training data. contain list of tokenized sentences
model = Word2Vec(training_data, size=50, window=5, min_count=10, workers=4)

# datasmall: more sentences
# total_examples: number of additional sentence
# epochs: provide your current epochs. model.epochs is ok 
model.train(datasmall, total_examples=len(datasmall), epochs=model.epochs)

score 1 · Answer 2 · answered Jan 27 '18 at 03:02

1

The total_examples (and epochs) arguments to train() should match what you're currently providing, in your more_sentences – not leftover values from prior training.

So for example, given your code showing just a single additional sentence, you'd specify total_examples=1.

If this isn't the source of the problem, double check that more_sentences is what you expect it to be at the time of the train() call.

answered Jan 27 '18 at 03:02

gojomo

52,260
14
86
115

I have changed total_examples=1, and epochs=1, but it also gave me the same results. And the `more_sentences` also has the value that that I expect to put in the train() method – dididaisy Jan 28 '18 at 09:13
Are you sure all the words in the new text appear more often than `min_count`? (All words being skipped for being low-frequency would be a possible explanation for a `0` trained-word count.) – gojomo Jan 30 '18 at 02:17
Ya, that is possible, but in 'train()' method I can't give the 'min_count' parameter. Maybe the 'min_count' is too low in the original model? is there any way to set/change 'min_count'? – dididaisy Jan 30 '18 at 02:26
If infrequent words you'd like retained are being lost, the original `min_count` is too high You might be able to direct-reassign `model.min_count` before doing your `build_vocab(..., update=True)`. However, note that eliminating low-frequency words is usually good for vector-quality & training-time – and a process of adding individual new sentences is unlikely to work well. (A model become self-consistently good via the interleaved training of many varied examples; toy-sized corpuses or toy-sized updates are unlikely to work well.) – gojomo Jan 30 '18 at 04:14

gensim - Word2vec continue training on existing model - AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

2 Answers2