Access scores for observation on linear discriminants in LDA using MASS:lda()

Question

library(MASS)
example(lda)
plot(z)

How can I access all the points in z? I want to know the values of every point along LD1 and LD2 depending on their Sp (c,s,v).

Re edit on tags: The 'lda' tag got taken by a text analysis method with the same acronym (LDA) as linear discriminant analysis. — IRTFM, Dec 29 '12 at 17:00

score 5 · Accepted Answer · answered Sep 19 '12 at 14:33

What you are looking for is computed as part of the predict() method of objects of class "lda" (see ?predict.lda). It is returned as component x of the object produced by predict(z):

## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                   Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1) ## remove this line if you want it to be pseudo random
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
##  c  s  v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)

## get the whole prediction object
pred <- predict(z)
## show first few sample scores on LDs
head(z$x)

the last line shows the first few rows of the object scores on the linear discriminants

> head(pred$x)
          LD1        LD2
40  -8.334664  0.1348578
56   2.462821 -1.5758927
85   2.998319 -0.6648073
134  4.030165 -1.4724530
30  -7.511226 -0.6519301
131  6.779570 -0.8675742

These scores can be plotted like so

plot(LD2 ~ LD1, data = pred$x)

producing the following plot (for this training sample!)

lda scores plot

Quite useful and then so simple that I can't beleive I had not seen that before. Thanks! — Federico C, Sep 20 '12 at 11:53

score 1 · Answer 2 · answered Sep 19 '12 at 13:52

When you calling the function plot(z), you are actually calling the function plot.lda - this is an S3 method. Basically, the object z has class lda:

class(z)

We can look at the actual function that is being used:

getS3method("plot", "lda")

This turns out to be rather involved. But the key points are:

x = z
Terms <- x$terms
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
X <- X[, -xint, drop = FALSE]
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling

We can no plot as before:

plot(X[,1], X[,2])

Proviso There might well be an easier way of getting what you want - I just don't know the lda function.

+1 and the extra magic source is that this is done for you in the `predict()` method of `"lda"` objects, and then some as it provides several different ways to generate the predictions. I have supplied an example in my answer. — Gavin Simpson, Sep 19 '12 at 14:34
`predict` - I should have guessed. Definitely the correct way to go. — csgillespie, Sep 19 '12 at 14:35

scrameri · Answer 3 · 2021-09-10T23:33:34.933

Some more details to the nice example by GavinSimpson

## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                   Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1)
train <- sample(1:150, 75)
z <- lda(Iris[train,-5], grouping = Iris[train,5], prior = c(1,1,1)/3)

## get the whole prediction object
pred <- predict(z)

If you want to understand the magic of predict.lda, here is how you can calculate (predict) the matrix of LD scores by hand: All you have to do is to center the original data (or new data of the same kind) using the centering vector (LDA uses the mean of group means) of the original data colMeans(z$means) and multiply each centered observation by the LD loadings (linear coefficients) stored in z$scaling.

Predict by hand

pred.self <- scale(Iris[train,-5], center = colMeans(z$means), scale = FALSE) %*% z$scaling

Verify using predict.lda()

> all.equal(pred$x, pred.self) # it's the same!
[1] TRUE

Verify using plot.lda()

plot(z, dimen = 2)
points(pred.self, col = "blue") # corresponds to plot.lda() output

image output

A closer look at the predict.lda function

getAnywhere(predict.lda)

reveals that since neither the input data nor the LD scores are stored in the lda object, the predict.lda function has to look for the input data in the R environment.

newdata <- eval.parent(z$call$x)
all.equal(newdata, Iris[train,-5]) # it's the same!

Access scores for observation on linear discriminants in LDA using MASS:lda()

3 Answers3

Linked