1

I used rxLogit function in Revolution R(Package RevoScaleR) to fit a logistic regression model on a data that has many categorical variables(for example STATE-IL,FL,OH,CA,TX..) and a couple of numeric variables. When I am trying to score a data using rxPredict function I am getting an error

INTERNAL ERROR: In rxPredict, the number of parameters does not match the number of  variables: 16 vs. 45.

One thing I understood was, as the scoring data has only fewer categories(say data with STATE-IL). So the number of parameters provided to the model are fewer than the variables in the model.

The code that I wrote is:

winlossmodel <- rxLogit(Flag ~ price+quantity+product_group+state+mode_of_transportation+cluster, data = training_data)

winlosspredict <- rxPredict(winlossmodel, data=scoring_data,type="response",checkFactorLevels=FALSE)

Here price and quantity are continuous variables. State,product_group,mode_of_transportation and cluster are categorical variables. My training_data has data for all the states but the scoring_data has data for only few states. Is this causing the error?

llrs
  • 3,308
  • 35
  • 68
Sid
  • 251
  • 2
  • 4
  • 17
  • Yes, I would say, so try to limit the number of training_data to the same of the scoring_data to check if that affects the error – llrs Feb 27 '14 at 16:29
  • Yes it does affect the error. But is that the correct way of fitting the model? If i only take training data with one product group, one state and one mode of transportation, I am building a biased model. Aren't I? I think I need to build a model for all the product groups, all states and all categories of mode_of_transportation. Then predict a response for one particular state, product_group and mdoe_of_transportation. Please correct me if I am missing something or If I am wrong. Thanks – Sid Feb 27 '14 at 17:04
  • I am not statistician but yes, you should use all the states with all the data. If you have doubts about statistics ask [here](http://stats.stackexchange.com/) – llrs Feb 27 '14 at 21:24
  • Has anyone had luck with this? I have a similar question here http://stackoverflow.com/questions/38790530/revoscaler-rxpredict-the-number-of-parameters-does-not-match-the-number-of-var. I agree with you, in that the model should be parameterised based on all available data, and the `rxPredict` function can be predicting outcomes based on a single, or fewer inputs - in your example, a single state and product group (assuming price and quantity aren't categorical). – Hamilton Blake Aug 05 '16 at 13:46

0 Answers0