1
library(ISLR2)
df = Auto
df$mclass <- as.factor(ifelse(df$mpg <20, 'low', ifelse(df$mpg >= 20  & df$mpg < 27, 'medium', 'high')))

I have split the dataset into test and training

test = df[df$year == 75,]
test.direction = test$mclass
training =  df[df$year !=  75,]
training.direction = training$mclass

I have done LDA on entire dataset

lda1 = lda(mclass ~ acceleration + displacement + horsepower +  weight, df)

ive used only the test dataset to test prediction

table (predict(lda1,test)$class, test.direction)

Ive then redone LDA using just the training data set (everything not in test data set)

lda2 = lda(mclass ~ acceleration + displacement + horsepower +  weight, data = training)

and redone the prediction on test data

table (predict(lda2,test)$class,test.direction)

The results of both predictions are the same - even though the LDA have been done different datasets - I would expect that they would be different?

user4891693
  • 35
  • 1
  • 2
  • 5
  • 1
    the second lda is similar to the first because no subsetting was done. You should have `subset = !test` and not `subset != test` That is if `test` is a logical vector – Onyambu Apr 06 '23 at 01:21
  • apologies there was a typo in my post - it was as correct as per your msg – user4891693 Apr 06 '23 at 01:33
  • Without a reproducible example, all we can do is throw out random guesses. – Axeman Apr 06 '23 at 01:37
  • Note that `subset != test` and `subset = !test` are two different things. what is `test`? if `sum(df$test)` is 0 then the two lda will be exactly the same – Onyambu Apr 06 '23 at 01:37
  • @Axeman ive updated the post to hopefully allow reproduction . – user4891693 Apr 06 '23 at 03:12
  • The predictions are similar, but different, see e.g. `plot(predict(lda1,test)$posterior[, 1], predict(lda2,test)$posterior[, 1])`. The models are similar since they have almost the same data. They are so similar that when you just take the most likely predicted class, they give the same answers. This won't be always be the case, but it is somewhat likely to happen in small datasets. – Axeman Apr 06 '23 at 16:37

0 Answers0