I have a question that has to do with evaluating the prediction of a logistic regression model. I am fairly new to this, so please bear with me. First I will show what I have done with LDA because I want to have a similar "misclassification rate" when I am done with my logistic regression.
install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# but we only need to preform everything on the first two classes
train.new = train[1:3]
# test data
test = vowel.test
test.new = test[1:3]
# normalizing the training data (0 mean and sd 1)
my_scale <- function(x) sweep(sweep(x, 2, colMeans(x)), 2, apply(x, 2, sd),
'/')
train.scaled = my_scale(train.new[,2:3])
train.scaled = cbind(y = train.new[,1], train.scaled)
test.scaled = my_scale(test[,2:11])
test.scaled = cbind(y = test[,1], test.scaled)
# LDA
library(MASS)
train.lda = lda(y~ ., data = train.scaled)
train.lda.values <- predict(train.lda, newdata=train.scaled)
train.lda.rate <- mean(train.lda.values$class != train.scaled[,1])
train.lda.rate
test.lda.values <- predict(train.lda, newdata=test.scaled)
test.lda.rate <- mean(test.lda.values$class != test.scaled[,1])
test.lda.rate
This gives a train.lda.rate = 0.5265152 and test.lda.rate = 0.461039.
Now basically I want to be able to pull the same kind of misclassification rates out of logistic regression but I know that after I use predict() on my logistic regression model there is no $class option. So I am wondering how to find the predicted classes so that I can see if they are equal to the original classes and therefore obtain the classification rates like I did above.
Here is my code for logistic regression:
train.scaled$y <- as.factor(train.scaled$y)
logit <- glm(y ~ ., data = train.scaled, family = "binomial")
pred.train.logit <- predict(logit, newdata=train.scaled, type = "response",
se = TRUE)
EDIT
After implementing the solution provided in the comments I did the following:
compare.train = data.frame(Actual=train.scaled$y,
Predicted_probability=predict(logit, type="response"),
Predicted = ifelse(predict(logit, type="response") > 0.5, 1, 0))
pred.train.logit.error <- mean(compare.train$Predicted != train$y)
pred.train.logit.error
Which gave me a misclassification error of 0.9715909. This seems a bit high and makes me think I went wrong somewhere. Any suggestions would be appreciated!
ANOTHER EDIT
I've gone ahead with the suggestion in the comments and used multinomial logistic regression so I can deal with the 11 classes.
library("nnet")
multi.logit.train <- multinom(y ~ ., data = train.new)
summary(multi.logit.train)
head(fitted(multi.logit.train))
predicted.train = predict(multi.logit.train, data=train.new, type="probs")
head(predicted.train)
I am so close here because the head(predicted.train) looks like the right structure, I just get the atomic vector error - I'm thinking I need to changed the classes (1-11) to a factor type but I am not positive.
> str(predicted.train)
> num [1:528, 1:11] 6.06e-01 4.13e-01 8.41e-03 1.71e-05 1.80e-05 ...
> - attr(*, "dimnames")=List of 2
> ..$ : chr [1:528] "1" "2" "3" "4" ...
> ..$ : chr [1:11] "1" "2" "3" "4" ...