0

as the title suggests I am trying to make predictions using an LDA model in R. I have two sets of data that I'm working with: the first set is a series of entries associated with 16 predictor variables and 1 outcome variable (the outcome variable are "groups" that each entry belongs to that I've assigned myself), the second set of data also consists of entries associated with the same 16 predictor variables, but with no outcome variable. What I would like to do is predict the group membership of the entries in the second set of data.

So far I've successfully managed to create an LDA model by separating the first dataset into a "training set" and a "test set". However, now that I have the model I don't know how I would go about predicting the group membership of the entries in my second data set.

Thanks for the help! Please let me know if any more information is required, this is my first post on stack overflow so I am still learning the ropes.

George96
  • 1
  • 1

1 Answers1

0

Short example based on An introduction to Statistical learning, chapter 4. Say you have fitted a model lda_model on a training_data set, with dependent variable Group which you aim to predict, and predictors Predictor1 and Predictor2

library(MASS)
lda_model <- lda(Group∼ Predictor1 + Predictor2, data = training_set)

You can then make predictions with the lda_model using the predict function on the testing_set

lda_predictions <- predict(lda_model, testing_set)

lda_predictions then holds the posterior probabilities in $posterior that the observation is part of Group j.

You could then apply a threshold of for instance (but not limiting to) 50% probability. E.g.

sum(lda_model$posterior[, 7] >= .5)

returns the number of observations for which the probabilty that the observation is part of Group 7 is larger than 50%

tavdp
  • 421
  • 2
  • 6
  • 11