0

I would like to plot the decision boundaries of LDA for a matrix with 3 input variables and 2 classes. I could find some code for plotting the boundaries if only 2 input variables are given to LDA, but the code I found for 3 input variables gives an incorrect boundary.

# With 2 input variables

attach(iris)

index=Species!="versicolor"

iris=iris[index,]

LDA <- lda(Species ~ Sepal.Length + Sepal.Width, data=iris)
GS <- 500
x1 <- seq(min(Sepal.Length), max(Sepal.Length), len=GS)
x2 <- seq(min(Sepal.Width), max(Sepal.Width), len=GS)
x <- expand.grid(x1, x2)
newdat <- data.frame(Sepal.Length=x[,1], Sepal.Width=x[,2])

lda.Ghat <- as.numeric(predict(LDA, newdata=newdat)$class)

plot(Sepal.Length,Sepal.Width,col=Species)
contour(x1, x2, matrix(lda.Ghat, GS,GS), 
levels=c(1,2),add=TRUE,drawlabels=FALSE, col="red")
legend("topright",legend=c('setosa','virginica'),fill=c("black","green"))

# With 3 input variables

LDA <- lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length,data=iris)
GS <- 500
x1 <- seq(min(Sepal.Length), max(Sepal.Length), len=GS)
x2 <- seq(min(Sepal.Width), max(Sepal.Width), len=GS)
x <- expand.grid(x1, x2)

newdat <-data.frame(Sepal.Length=x[,1],Sepal.Width=x[,2],Petal.Length=mean(Petal.Length))

lda.Ghat <- as.numeric(predict(LDA, newdata=newdat)$class)

plot(Sepal.Length,Sepal.Width,col=Species)

contour(x1,x2,matrix(lda.Ghat,GS,GS),levels=c(1,2),add=TRUE,drawlabels=FALSE,col="red")

legend("topright",legend=c('setosa','virginica'),fill=c("black","green"))
  • 1
    Hi, nice question. There may be some points about your methodology here : 1) Why drop the information `Petal.length` using only the mean in `newdat` ? (The predictions will be worse). 2) Your decision boundary is based on 3D inputs, it is a 3D (bad) decision boundary since you are looking at a peculiar (maybe not the best) projection of the data. 3) You use a grid to compute the boundary, though : you create synthetic points that may not be well predicted depending on the problem. – cbo Jun 28 '19 at 10:48
  • Hi, thanks for the comments. But this is not my code. It is some code that I could find on Stackoverflow for 2 input variables. But I would like to have a version for 3 input variables. – PMartins Jun 28 '19 at 13:04
  • After some research, it seems most people don't compute the decision boundary but only plot it. This could be tricky in dimension > 2D. So using pca on result to keep only 2 dimensions could do the trick here. I am not totally satisfied with this answer though, I will it look up tomorrow. – cbo Jun 28 '19 at 17:04
  • Actually on my dataset I am first performing PCA, and then only selecting the top PCs for LDA. in some cases I will have 3 or more PCs in other cases I will have 2 PCs. – PMartins Jul 01 '19 at 08:52

0 Answers0