Handle mismatch in number of features in Training Data and Prediction Data

Question

I have 6 text features (say f1,f2,..,f6) available for the data on which I have trained a model. But when this model is deployed and a new data point comes, for which I have to make prediction using this model, it has only 2 features (f1, and f2). So, there is the problem of feature mismatch. How can I tackle this problem? I have a few thoughts, but that are not very efficient.

Use only two features for training (f1 and f2), and discard other features (f3,..,f6). But this leads to a loss of information and my test set accuracy decreases.
Learn some relation between (f3,..,f6) with (f1 and f2). So that even though, (f3,..,f6) is not there in the new data point, the information can be extracted from f1, and f2 only.

score 1 · Answer 1 · answered Jul 26 '18 at 05:17

The best way is of course train a new model using f1, f2 and any new data you may have.

Don't want to do that? If you don't have f3...f6, you shouldn't magically expect the model works as intended.

Now, think what are those "f3...f6"? Are they related to the new information you have? If they are, you may be able to approximate them. We can't tell you what to do because we don't have any clue what they are. Interpolation? Regression? Rough approximation?

My suggestion: you are missing most of the predictors for your model. Your old model is meaningless. Please just train a new one.

score 0 · Answer 2 · answered Jul 26 '18 at 08:13

Perhaps you could fill in data for f3 to f6 with noise data that is an average value for all data that includes that feature. That way the data from features f3 through f6 won't stand out too much, and won't lean your classifier one way or the other. The classifier would be more likely to rely on the features provided f1 and f2 to classify.

When calculating this make sure the averages are calculated for each classification first then averaged. That way if your data set has a large amount of one class it won't skew the average.

Of course this might be an over simplification, and would work best with binary classification. It depends on the data set and classification.

Hope this helps :)

Handle mismatch in number of features in Training Data and Prediction Data

2 Answers2