I've seen plenty of questions about this topic but couldn't find any clear answer to solve my problem: I save a model with the following code:
clf = SVC(gamma=1,C=1)
clf.fit(X_train,y_train)
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(clf, open(filename, 'wb'))
I then load it with a different file:
# load the model from disk
fname = 'finalized_model.sav'
clf = pickle.load(open(fname, 'rb'))
y_pred = clf.predict(df_live)
I get this error:
ValueError: X.shape[1] = 22 should be equal to 26, the number of features at training time
when I prepare the data, I use:
df_dummies = pd.get_dummies(df)
and the reason I get more features is because the training data is much larger then the predicted data and hence more categorized features.
My question is what is the best practice to make the number of features even without harming the model?
Thanks