0

loaded already trained SGDC model and tried to again partial_fit with new features set and data. but received ValueError: classes should include all valid labels that can be in y and my class_weights = None and wanted to have each class equal weights.

model_predicted_networktype = joblib.load(f)
new_training_data_count_matrix 
=count_vect_predicted_networktype.transform(training_dataset)
new_training_tf_idf = tf_idf(new_training_data_count_matrix)
model_predicted_networktype.partial_fit(new_training_tf_idf,training_labels)

I got the issue has I am adding new features to my already trained model and those are different what previously have fitted, but I need to add new features to already partial_fit data?

Chetan Kabra
  • 353
  • 5
  • 10
  • Nothing to do with features. The first call to partial_fit() should include all your different classes in a parameter called `classes`, even if your actual `training_labels` contain only some of them. See the [documentation pf `partial_fit()` here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier.partial_fit) – Vivek Kumar May 20 '17 at 02:37
  • is there any ways i can add labels on a fly – Chetan Kabra May 22 '17 at 18:43
  • No. All possible labels have to be declared in first call. The training data X, y may not contain all labels at that time. See the examples. – Vivek Kumar May 23 '17 at 01:57

1 Answers1

0

Do classes=numpy.arange(some_estimated_max_number) in your first call to partial_fit and map the numbers to actual labels. This way you can add your data on the fly.

user1
  • 391
  • 3
  • 27