0

After building the model we save the model to do live predictions. But saving the model will be simple if there is no feature engineering, for instance say I have done some chisquare, Randomforest to get some features which are contributing on model accuracy. But when I save this model the feature used on building this model will be entirely different from the raw data which is passed during training the model.

tnx in advance.

  • I'm afraid I don't get your question. You have a data set `x` which has all the raw data. You create, transform and adjust some variables in `x` (feature engineering), then you train your model on your transformed `x`, find the optimal one (this can involve creating more columns) and then you have a final `x` and a final model. You can just save your final model and it should match your last `x`, won't it? – cimentadaj Dec 19 '19 at 07:06

1 Answers1

0

TL DR: You have to run the feature generation pipeline on your unseen data as well before passing through the model.

Long Version: Features are not saved in the model, but the parameters. For e.g. you have 10 different points in the Cartesian plane (x and y coordinates are features) and you transformed them to polar coordinates, say r and theta. Thereafter, you modeled it as a circle. Based upon the transformed features (coordinates in polar space) you calculate the best fitting center C and radius r for the circle. Then you can save the center and radius as the model. The model doesn't have the features saved in it but the parameters C and r. Now given a new point, you will transform it into polar space before using the model for decision making. So, the feature generation pipeline (transformation to polar space in the above example) along with the model (center and radius) is enough for modeling purposes. Hope this clarifies the doubt.

Sukuya
  • 120
  • 7