0

I am trying to see if I can re-train an existing classifier (.pickle file, trained using nltk-trainer) with some newly obtained data. I've trained the classifier using these links as reference, [1], [2]

As of now, I am retraining a new classifier on all the data everytime I receive the new training data but this is a cumbersome thing to do since training on the entire dataset again and again is a time and computation expense.

Is there a better way?

alvas
  • 115,346
  • 109
  • 446
  • 738
Chenna V
  • 10,185
  • 11
  • 77
  • 104
  • Online models are different from offline ones. Most of the NLTK taggers are offline models. https://en.wikipedia.org/wiki/Online_machine_learning. If you would like an "update-able" model, you can try `partial_fit` estimators in scikit-learn: http://scikit-learn.org/stable/modules/scaling_strategies.html#incremental-learning – alvas Apr 06 '16 at 08:44
  • See http://stackoverflow.com/questions/16128834/train-scikit-svm-one-by-one-online-or-stochastic-training too. – alvas Apr 06 '16 at 08:46

0 Answers0