I am trying to use StreamingLogisticRegressionwithSGD to build a CTR prediction model.
mentions that the numFeatures should be constant.
The problem that I am facing is : Since most of my variables are categorical, the numFeatures variable should be the final set of variables after encoding and parsing the categorical variables in labeled point format.
Suppose, for a categorical variable x1 I have 10 distinct values in current window.
But in the next window some new values/items gets added to x1 and the number of distinct values increases. How should I handle the numFeatures variable in this case, because it will change now ?
Basically, my question is how should I handle the new values of the categorical variables in streaming model.
Thanks, Kundan