2

I am using glm() function in R with link= log to fit my model. I read on various websites that fitted() returns the value which we can compare with the original data as compared to the predict(). I am facing some problem while fitting the model.

data<-read.csv("training.csv")
data$X2 <- as.Date(data$X2, format="%m/%d/%Y")
data$X3 <- as.Date(data$X3, format="%m/%d/%Y")
data_subset <- subset(...)
attach(data_subset)

#define variable
Y<-cbind(Y)
X<-cbind(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X14)

# correlation among variables
cor(Y,X)

model <- glm(Y ~ X , data_subset,family=Gamma(link="log"))
summary(model)

detach(data_subset)

validation_data<-read.csv("validation.csv")

validation_data$X2 <- as.Date(validation_data$X2, format="%m/%d/%Y")
validation_data$X3 <- as.Date(validation_data$X3, format="%m/%d/%Y")

attach(validation_data)
predicted_valid<-predict(model, newdata=validation_data)

I am not sure how does predict work with gamma log link. I want to transform the predicted values so that it can be compared with the original data. Can someone please help me.

Nikita
  • 907
  • 2
  • 11
  • 14

2 Answers2

3

Looks to me like fitted doesn't work the way you seem to think it does.

You probably want to use predict there, since you seem to want to pass it data.

see ?fitted vs ?predict

Glen_b
  • 7,883
  • 2
  • 37
  • 48
  • even if i use predict() I am facing the same issue. Predict also produces output which has no of samples equal to training data. Does it have to do anything with the attach or detach? – Nikita Sep 18 '14 at 01:29
  • 1
    It has nothing to do with `attach` or `detach`, but for other reasons I'd strongly advise you to avoid using `attach` and `detach`. Where `data=` arguments exist, use them, and more generally use `with`. It means a little more typing, but it will save you a lot of pain later. You must use exactly the right argument names (and predict is a particular bugbear if your new data doesn't have identical column names to the original, though I don't think you have that problem there). – Glen_b Sep 18 '14 at 02:37
  • @glen_b Thanks a lot for helping me. I am getting an error 1.'newdata' had 2496 rows but variables found have 1639 rows 2.In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading – Nikita Sep 18 '14 at 05:06
  • @glen_b I was using X<- cbind(data_subset$X1, data_subset$X2..) and after i changed it to X<-cbind(X1, X2...) I am no longer getting an error. I have a question, how to i transform the predicted value so that it will be comparable to the original data – Nikita Sep 20 '14 at 15:39
3

Add type="response" to your predict call, to get predictions on the response scale. See ?predict.glm.

predict(model, newdata=*, type="response")
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187