2

I have created a logistic regression model using the built in iris dataset in R...

# Includes iris dataset.
library(datasets)

# Dummy variable to predict.
iris$dummy.virginica.iris <- 0
iris$dummy.virginica.iris[iris$Species == 'virginica'] <- 1
iris$dummy.virginica.iris

# Logistic regression model.
glmfit<-glm(dummy.virginica.iris ~ Petal.Width, 
        data = iris, 
        family = 'binomial') 
summary(glmfit)

How would I create a classifier based on this model with a suitable cut-off value such as 0.5? Any suggestions or help would be greatly appreciated.

Lynda
  • 141
  • 7

1 Answers1

3

You want to use the predict function with type=response to get the probability that each row belongs to species virginica:

glmfit.pred <- predict(glmfit, type="response")
virginica <- ifelse(glmfit.pred > .5, TRUE, FALSE)
table(iris$Species, virginica)
#             virginica
#              FALSE TRUE
#   setosa        50    0
#   versicolor    48    2
#   virginica      4   46

So in this example, 46 of 50 specimens belonging to virginica were correctly classified while 2 of 50 specimens of versicolor were mistakenly corrected as virginica while 48 of 50 specimens of versicolor were correctly classified as not virginica and all 50 specimens of setosa were correctly classified as not virginica.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thanks so much for you help @dcarlson! I am only learning so is classifier even the right name for this? Would I compare means of species and virginica to get the accuracy of the classifier or is there a more effective way to check the accuracy of it? – Lynda Dec 30 '19 at 08:31
  • 1
    Logistic regression can be used to make predictions about the class an observation belongs to. It works only on dichotomous groups, in this case _virginica_ vs not _virginica_. Other methods such as discriminant functions can predict membership in more than 2 groups. In this case the accuracy of the prediction is computed as (46 + 50 + 48 = 144), the number of correct predictions divided by the number of predictions, 150. That gives you a 96% accuracy. If guessed you would be right about 33% of the time. – dcarlson Dec 30 '19 at 20:04