0

I have created a model using bic.glm, and I am trying to predict probabilities on validation data which does not have the dependent variable 'is_blocked'.

When I run the predict() function on the validation data, I get the following error:

Error in eval(expr, envir, enclos) : object 'is_blocked' not found

Why would I get this error, when is_blocked is the variable I'm trying to predict?

Walter Williams
  • 944
  • 3
  • 11
  • 25
  • In the source of prdict.bic.glm() I found: if (!is.null(object$formula)) { newdata <- model.matrix(object$formula, data = newdata)[,-1] } So the first variable, which very likely is the dependent, is removed. Hence, try to run it with the complete data frame. What happens then? I have seen often that predict methods take in entire data.frame and discard the dependent variable. This may not make sense, but in the end you do not have to remove the variable yourself. Plus in the future the method may be extended requiring dependent variable. This way its backwards compatible. – Dmitrii I. Aug 20 '14 at 15:31
  • Actually I misread the code. The intercept is discarded not the dependent variable. Still, model.matrix() seems to need the dependent variable to construct model matrix. – Dmitrii I. Aug 20 '14 at 15:45
  • The validation data I have does not include the dependent variable. It comes from a Kaggle competition where we submit the probability of the dependent variable being true. – Walter Williams Aug 20 '14 at 16:54
  • Could you provide a minimal working example? Then we will be able to help you faster. I would try adding a vector of ones or something as dependent variable... It should be ignored anyway. – Dmitrii I. Aug 21 '14 at 07:07

0 Answers0