2

In machine learning, more features or dimensions can decrease a model’s accuracy since there is more data that needs to be generalized and this is known as the curse of dimensionality.

Dimensionality reduction is a way to reduce the complexity of a model and avoid overfitting. Principal Component Analysis (PCA) algorithm is used to compress a dataset onto a lower-dimensional feature to reduce the complexity of the model.

When/How should I consider that my data set has many numbers of features and I should look for PCA for dimension reduction?

Sachin Rastogi
  • 409
  • 5
  • 8

2 Answers2

0

simple answer is , Its is used When we need to tackle the curse of dimensionality

When should I use PCA?

  1. Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration?
  2. Do you want to ensure your variables are independent of one another?
  3. Are you comfortable making your independent variables less interpretable?

If you answered “yes” to all three questions, then PCA is a good method to use. If you answered “no” to question 3, you should not use PCA. Good tutorial is here

backtrack
  • 7,996
  • 5
  • 52
  • 99
0

Let me provide another view into this.

In general, you can use Principal Component Analysis for two main reasons:

  1. For compression:

    • To reduce space to store your data, for example.
    • To speed up your learning algorithm (selecting the principal components with more variance). Looking at the cumulative variance of the components.
  2. For visualization purposes, using 2 or 3 components.

ButchMonkey
  • 1,873
  • 18
  • 30
Marisaz
  • 55
  • 5