LDA has an internal mechanism to reduce the number of features into a few important latent variables:
Like PCA, LDA uses linear combinations of the predictors to create new axes which are used for the final classification.
Unlike PCA, it tries to maximize the differences between the groups whereas PCA does not care about the labels and maximizes the total variance instead.
Furthermore, the coefficient will be set constant, if the variance of a variable is lower than a tolerance threshold (option tol
in MASSS::lda
).
The features are weighted by multiplying the raw data with the scaling
coefficients matrix to get the data in the LDA transformed space. Sepal.Length
is the most useful feature to discriminate between the species (Highest absolute value of LD1 in the scaling matrix) and the second LDA axis is almost not important at all (Proportion of trace):
library(MASS)
model <- lda(Species ~ ., iris)
model
#> Call:
#> lda(Species ~ ., data = iris)
#>
#> Prior probabilities of groups:
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333
#>
#> Group means:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> setosa 5.006 3.428 1.462 0.246
#> versicolor 5.936 2.770 4.260 1.326
#> virginica 6.588 2.974 5.552 2.026
#>
#> Coefficients of linear discriminants:
#> LD1 LD2
#> Sepal.Length 0.8293776 0.02410215
#> Sepal.Width 1.5344731 2.16452123
#> Petal.Length -2.2012117 -0.93192121
#> Petal.Width -2.8104603 2.83918785
#>
#> Proportion of trace:
#> LD1 LD2
#> 0.9912 0.0088
model$scaling
#> LD1 LD2
#> Sepal.Length 0.8293776 0.02410215
#> Sepal.Width 1.5344731 2.16452123
#> Petal.Length -2.2012117 -0.93192121
#> Petal.Width -2.8104603 2.83918785
Created on 2021-10-04 by the reprex package (v2.0.1)