how will i get the important features and eleminate the feature which is not selected after performing pca?

Question

here i have tried to perform pca on my dataset but i dont have any idea how to get the important features and eleminate the feature which is not selected. here i have given a condition that if data contains more than 10 features then perform PCA else dont perform PCA.

from sklearn.decomposition import PCA
from sklearn import preprocessing
columns = x.columns
def Perform_PCA(x): 
    no_of_col = len(x.columns)
    percent = 90
    my_num = int((percent/100)*no_of_col)
    if no_of_col >= 10:
        pca = PCA(n_components = my_num)
        x_new = pca.fit_transform(x)
        print("More than 10 columns found Performing PCA")
        return selected_var
    else:
        print("Less than 10 columns found no PCA performed")
        return x
        
        
x = Perform_PCA(x)
x

You should check this question: https://stackoverflow.com/questions/23294616/how-to-use-scikit-learn-pca-for-features-reduction-and-know-which-features-are-d?rq=1 — Robin Thibaut, May 31 '22 at 08:26

Fabian · Answer 1 · 2022-05-31T08:44:55.347

0

In your current code you create my_num components, but only if you have more then 10 columns.

If you want to have a look and select the features yourself you could modify your code:

 pca = PCA()
 x_new = pca.fit_transform(x)
 explained_variance = pca.explained_variance_ratio_
 print(explained_variance)
 print(pd.DataFrame(pca.components_,columns=x.columns))

This will give you the explained variance for every feature in your dataset. From here you can set the bar how many features should be selected.

edited May 31 '22 at 08:44

answered May 31 '22 at 08:13

Fabian

756
5
12

so like how will i get the column names which are selected after the pca this is what i wanted to know. – Dharambir Maht0 May 31 '22 at 08:35
Generally every feature is "selected" but the weight they influence the PCA Components is different. I adapted the code so you can see which of your columns contribute the most to each Componont of your PCA. Also take a look at @RobinThibaut's answer because you might have a var-naming error ;) – Fabian May 31 '22 at 08:41

score 0 · Answer 2 · answered May 31 '22 at 08:23

I will first review your function:

from sklearn.decomposition import PCA
from sklearn import preprocessing
columns = x.columns
def Perform_PCA(x): 
    no_of_col = len(x.columns) 
    percent = 90 
    my_num = int((percent/100)*no_of_col)
    if no_of_col >= 10:
          pca = PCA(n_components = my_num)
          x_new = pca.fit_transform(x)
          print("More than 10 columns found Performing PCA")
          return selected_var
    else:
          print("Less than 10 columns found no PCA performed")
          return x

You are performing PCA only if there are more than ten columns, but your function returns selected_var, which does not exist.

Also, PCA does not "select features", it transforms the input data by computing a lower-dimensional representation. If you want to remove columns, use the pca.transform(x) function.

Here is your code modified (it would be possible to optimise it further, but I tried to change it as little as possible):

from sklearn.decomposition import PCA
from sklearn import preprocessing
columns = x.columns
def Perform_PCA(x): 
    no_of_col = len(x.columns) 
    percent = 90 
    my_num = int((percent/100)*no_of_col)

    if no_of_col >= 10:
        pca = PCA(n_components = my_num)
        x_new = pca.fit_transform(x)
        print("More than 10 columns found Performing PCA")
        return x_new
    else:
         print("Less than 10 columns found no PCA performed")
         return x

Hope this will help you.

Thank you sir i got my answer and its solved. thank you very very much. — Dharambir Maht0, May 31 '22 at 08:48

how will i get the important features and eleminate the feature which is not selected after performing pca?

2 Answers2