Discriminant analysis in R: how to obtain the threshold weight

Question

I'm running a linear discriminant analysis with 2 variables and 2 groups in R, i.e.,

ldares <- lda(dat[,2:3], grouping=dat[,1])

Next, I would like to obtain the formula for the decision bound that separates the groups. I know that I can output the coefficients of the linear discriminant with:

coef(ldares)

However, given that the decision bound is described by:

a*v1 + b*v2 + c = 0,

how do I get the bias or threshold weight c?

Your code does not run, making it difficult for people to offer suggestions. Also, perhaps ask on the statistics forum Cross Validated. — Mark Miller, Dec 29 '12 at 13:43

score 2 · Answer 1 · answered Dec 29 '12 at 17:51

2

When no prior weights are given, I believe you will discover that c=0 and that the discriminant scores are based on the distribution of the cases setting the priors. You can see that a score construction with an implicit c=0 assumption produces the expected split in prediction with the iris dataset:

require(MASS)
ldares <- lda(iris[ iris[,5] %in% c("setosa", "versicolor"),2:3], 
               grouping=iris[iris[,5] %in% c("setosa", "versicolor") ,5])
scores <- with( iris[ iris[,5] %in% c("setosa", "versicolor") , 2:3],
                 cbind(Sepal.Width, Petal.Length) %*% coef(ldares) )
with( iris[ iris[,5] %in% c("setosa", "versicolor") , c(2:3, 5)], 
              plot(Sepal.Width, Petal.Length, col=c("black", "red")[1+(scores>0)])  )

enter image description here

answered Dec 29 '12 at 17:51

IRTFM

258,963
21
364
487

Thanks for your reply, @DWin. It seems to me, though, that this is an incidental feature of the data set. Try running your code with a different data set, iris1, with: `iris1 <- iris` `iris1[,2] <- 10 + iris[,2]` `iris1[,3] <- 10 + iris[,3]` You will find that the coefficients are the same, but c is not 0. It has a values somewhere around 15. – awcm0n Dec 29 '12 at 21:58
Someone correct me if I'm wrong, but I believe the answer is: c is the midpoint between the two group means on the discriminant. – awcm0n Dec 29 '12 at 22:42
The answer from Lak address these concerns. There is a centering step so the c==0 point becomes the group mean. It is essentially removing the Intercept from a linear regression result. – IRTFM Jun 19 '15 at 22:15

score 0 · Answer 2 · edited May 16 '14 at 14:44

You should realize is that LDA is a linear combination of centered variables. So, the discrimination function is really:

\Sigma [w * (x - mean(x))]  >  0

and therefore:

\Sigma [w * x]  >  \Sigma w * mean(x)

The threshold is therefore \Sigma w * mean(x). Unfortunately, LDA doesn't report mean(x) over the entire dataset, only the two group means. But this allows us to compute the threshold in a rather intuitive way.

Assuming that result is your LDA result, the threshold is mid-way between the response to the centroids of the two classes:

> `sum( result$scaling * result$means[2,] + result$scaling * result$means[1,] )/2`

p.s. Note that in the original question w1*a1 + w2*a2 + c = 0, the threshold is -c

Discriminant analysis in R: how to obtain the threshold weight

2 Answers2

Linked