6

I am trying to continue training on an existing model,

model = gensim.models.Word2Vec.load('model/corpus.zhwiki.word.model')
more_sentences = [['Advanced', 'users', 'can', 'load', 'a', 'model', 'and', 'continue', 'training', 'it', 'with', 'more', 'sentences']]    
model.build_vocab(more_sentences, update=True)
model.train(more_sentences, total_examples=model.corpus_count, epochs=model.iter)

but I got an error with the last line:

AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

Some posts said it's caused by using a earlier version of gensim, and I have tried to add this after loading the existing model and before train().

model.compute_loss = False

After that, it didn't give me the AttributeError, but the output of model.train() is 0, and model didn't trained with new sentences.

enter image description here

How to solve this problem?

ouflak
  • 2,458
  • 10
  • 44
  • 49
dididaisy
  • 141
  • 3
  • 10

2 Answers2

7

Here is how I continues training my model

# training_data: initial training data. contain list of tokenized sentences
model = Word2Vec(training_data, size=50, window=5, min_count=10, workers=4)

# datasmall: more sentences
# total_examples: number of additional sentence
# epochs: provide your current epochs. model.epochs is ok 
model.train(datasmall, total_examples=len(datasmall), epochs=model.epochs)
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
1

The total_examples (and epochs) arguments to train() should match what you're currently providing, in your more_sentences – not leftover values from prior training.

So for example, given your code showing just a single additional sentence, you'd specify total_examples=1.

If this isn't the source of the problem, double check that more_sentences is what you expect it to be at the time of the train() call.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • I have changed total_examples=1, and epochs=1, but it also gave me the same results. And the `more_sentences` also has the value that that I expect to put in the train() method – dididaisy Jan 28 '18 at 09:13
  • Are you sure all the words in the new text appear more often than `min_count`? (All words being skipped for being low-frequency would be a possible explanation for a `0` trained-word count.) – gojomo Jan 30 '18 at 02:17
  • Ya, that is possible, but in 'train()' method I can't give the 'min_count' parameter. Maybe the 'min_count' is too low in the original model? is there any way to set/change 'min_count'? – dididaisy Jan 30 '18 at 02:26
  • If infrequent words you'd like retained are being lost, the original `min_count` is too high You might be able to direct-reassign `model.min_count` before doing your `build_vocab(..., update=True)`. However, note that eliminating low-frequency words is usually good for vector-quality & training-time – and a process of adding individual new sentences is unlikely to work well. (A model become self-consistently good via the interleaved training of many varied examples; toy-sized corpuses or toy-sized updates are unlikely to work well.) – gojomo Jan 30 '18 at 04:14