1

In my R logistic regression in R, I am trying to create a contingency table comparing fitted to observed values (i.e. 0 or 1 actual vs. 0 or 1 fitted value). However, my data has missing values in various rows of various variables, hence the fitted value vector is of a shorter length than the original data set. Here is an example:

test <- data.frame(male=c(1,0,1,0,0,1,1,0,1,0,0,1), 
                 height=c(58,100,NA,19,20,69,58,24,46,19,97,69))

model <- glm(male~height, family=binomial("logit"),data=test)

check_model <- table(test$male,fitted.values(model)>0.5)

Error in table(test$male, fitted.values(model) > 0.5) : all arguments must have the same length

Does anyone know of a way to feed in the actual values (test$male) only in rows where the model has a fitted.value that is not NULL?

Cleb
  • 25,102
  • 20
  • 116
  • 151
user1533277
  • 105
  • 1
  • 3
  • 7
  • Did you realize that your code implies that you think there is a function named `fitted.values`? Had you simply typed `?fitted` at the console (or perhaps `str(model)`, you would have made more rapid progress. – IRTFM Jul 18 '12 at 02:15

2 Answers2

2

If you look at ?glm you will see that it returns the model.frame (by default) as a component of the glm object

This contains the data used to fit the model

Thus you can use

table(model.frame(model)$male, fitted(model) > 0.5)

or

table(model$model$male, fitted(model) > 0.5)

To return your required results

##      FALSE TRUE
##   0     4    2
##   1     3    2
mnel
  • 113,303
  • 27
  • 265
  • 254
  • `glm` does not return "model.frame". It returns a much more complicated object, from which the function `model.frame` is able to extract the original data. – IRTFM Jul 18 '12 at 02:34
  • I have edited the response to be clearer -- it is returned as *part* of the object of class `"glm"`. – mnel Jul 18 '12 at 02:43
1
> table(test$male[complete.cases(test)], fitted(model)>0.5)

    FALSE TRUE
  0     4    2
  1     3    2
IRTFM
  • 258,963
  • 21
  • 364
  • 487