2

I used L1-based feature selection shown here in order to select suitable columns from pandas DataFrame X.

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel

iris = load_iris()
X, y = iris.data, iris.target

lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
model = SelectFromModel(lsvc, prefit=True)
X_new = model.transform(X)

However it is not clear to me how can I get the column names. Since X_new is numpy array, I tried this:

X_new.dtype.names

But it returns nothing. So, how can I actually understand which columns have been selected?

ScalaBoy
  • 3,254
  • 13
  • 46
  • 84
  • Have you seen https://stackoverflow.com/a/29907472/4764434? – Zero Sep 03 '18 at 12:00
  • @Zero: In my case the number of features is not equal in the original X and X_new. I saw this solution, but I do not understand what is "feature_selector.get_support": `X_selected_df = pd.DataFrame(X_new, columns=[X.columns[i] for i in range(len(X.columns)) if feature_selector.get_support()[i]])` – ScalaBoy Sep 03 '18 at 12:06
  • This is what I tried: `X_selected_df = pd.DataFrame(X_new, columns=[X.columns[i] for i in range(len(X.columns)) if SelectFromModel.get_support()[i]])`, but got `AttributeError: 'numpy.ndarray' object has no attribute 'columns'`. – ScalaBoy Sep 03 '18 at 12:08
  • Are you using a csv data file? – liam Sep 03 '18 at 12:16
  • No, I am using `iris = load_iris()`. No csv. – ScalaBoy Sep 03 '18 at 12:18
  • try using a csv file so that you can do `pd.read_csv` – liam Sep 03 '18 at 12:20
  • @LiamHealy: Ok, now I have this error: `TypeError: get_support() missing 1 required positional argument: 'self'` – ScalaBoy Sep 03 '18 at 12:21
  • I solved it. I had to use `model.get_support` instead of `SelectFromModel.get_support`. – ScalaBoy Sep 03 '18 at 12:24

1 Answers1

0

Once you have converted your data into a csv file, you will want to use pd.read_csv to get that file into a dataframe.

You can then use the columns attribute to access the columns.

Furthermore, you could use the to_list attribute to get the columns as a list.

Alternatively, you could use Ahmad's method:

import re

f = open('f.csv','r')

alllines = f.readlines()
columns = re.sub(' +',' ',alllines[0]) #delete extra space in one line
columns = columns.strip().split(',') #split using space

print(columns)

EDIT: The question was solved by the OP through using model.get_support instead of SelectFromModel.get_support

liam
  • 1,918
  • 3
  • 22
  • 28