0

I have a question that has to do with evaluating the prediction of a logistic regression model. I am fairly new to this, so please bear with me. First I will show what I have done with LDA because I want to have a similar "misclassification rate" when I am done with my logistic regression.

install.packages("ElemStatLearn")
library(ElemStatLearn)
# training data
train = vowel.train
# but we only need to preform everything on the first two classes 
train.new = train[1:3]


# test data
test = vowel.test
test.new = test[1:3]

# normalizing the training data (0 mean and sd 1)
my_scale <- function(x) sweep(sweep(x, 2, colMeans(x)), 2, apply(x, 2, sd), 
'/')
train.scaled = my_scale(train.new[,2:3])
train.scaled = cbind(y = train.new[,1], train.scaled)
test.scaled = my_scale(test[,2:11])
test.scaled = cbind(y = test[,1], test.scaled)

# LDA
library(MASS)
train.lda = lda(y~ ., data = train.scaled)
train.lda.values <- predict(train.lda, newdata=train.scaled)
train.lda.rate <- mean(train.lda.values$class != train.scaled[,1])
train.lda.rate

test.lda.values <- predict(train.lda, newdata=test.scaled)
test.lda.rate <- mean(test.lda.values$class != test.scaled[,1])
test.lda.rate

This gives a train.lda.rate = 0.5265152 and test.lda.rate = 0.461039.

Now basically I want to be able to pull the same kind of misclassification rates out of logistic regression but I know that after I use predict() on my logistic regression model there is no $class option. So I am wondering how to find the predicted classes so that I can see if they are equal to the original classes and therefore obtain the classification rates like I did above.

Here is my code for logistic regression:

train.scaled$y <- as.factor(train.scaled$y)
logit <- glm(y ~ ., data = train.scaled, family = "binomial")
pred.train.logit <- predict(logit, newdata=train.scaled, type = "response", 
se = TRUE)

EDIT

After implementing the solution provided in the comments I did the following:

compare.train = data.frame(Actual=train.scaled$y, 
Predicted_probability=predict(logit, type="response"),
    Predicted = ifelse(predict(logit, type="response") > 0.5, 1, 0))

pred.train.logit.error <- mean(compare.train$Predicted != train$y) 
pred.train.logit.error

Which gave me a misclassification error of 0.9715909. This seems a bit high and makes me think I went wrong somewhere. Any suggestions would be appreciated!

ANOTHER EDIT

I've gone ahead with the suggestion in the comments and used multinomial logistic regression so I can deal with the 11 classes.

library("nnet")
multi.logit.train <- multinom(y ~ ., data = train.new)
summary(multi.logit.train)
head(fitted(multi.logit.train))
predicted.train = predict(multi.logit.train, data=train.new, type="probs")
head(predicted.train)

I am so close here because the head(predicted.train) looks like the right structure, I just get the atomic vector error - I'm thinking I need to changed the classes (1-11) to a factor type but I am not positive.

> str(predicted.train)
> num [1:528, 1:11] 6.06e-01 4.13e-01 8.41e-03 1.71e-05 1.80e-05 ...
> - attr(*, "dimnames")=List of 2
> ..$ : chr [1:528] "1" "2" "3" "4" ...
> ..$ : chr [1:11] "1" "2" "3" "4" ...
Chris95
  • 75
  • 1
  • 10
  • the answer to your question is in the link I provided in your other question http://stackoverflow.com/questions/43440868/error-when-calculating-prediction-error-for-logistic-regression-model#comment73938904_43440868 – B Williams Apr 16 '17 at 22:04
  • I know I need to threshold the predicted posterior probabilities to get my yhat, I'm just not sure exactly how to implement this in R. – Chris95 Apr 16 '17 at 22:09
  • 1
    See the [second-to-last comment in this SO question](http://stackoverflow.com/questions/43108528/comparing-predicted-values-to-actual-values-for-logistic-regression). – eipi10 Apr 16 '17 at 22:37
  • Awesome @eipi10 that was a perfect response, completely answered my question! – Chris95 Apr 16 '17 at 22:45
  • @eipi10 I implemented your solution, however it seems that when I do 'pred.train.logit.error <- mean(compare.train$Predicted != train$y)' I get a misclassification error of 0.9715909. I am going to add what I did to the end of my question so you can see it with my old code. – Chris95 Apr 16 '17 at 22:56
  • FYI - I understand what the issue is, just need to fix it. My compare.train$Predicted are 1s and 0s (rounded up and down from the code you provided me with). However, the original data$Y or in my case train.new$y is grouped by 11 classes. So I want the compare.train$Predicted to tell me which class each prediction is in (a class from 1-11 instead of either a 0 or a 1). – Chris95 Apr 17 '17 at 00:43
  • 1
    Then it looks like you need multinomial logistic regression or another classification model (e.g., random forest, support vector machine) that can handle more than two classes. – eipi10 Apr 17 '17 at 03:59
  • I used multinomial logistic regression like you suggested, I am so close but just a little bit off from my final answer. I edited the post again with the updated model. – Chris95 Apr 17 '17 at 04:48

0 Answers0