How to get column names from my numpy array?

Question

I used L1-based feature selection shown here in order to select suitable columns from pandas DataFrame X.

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel

iris = load_iris()
X, y = iris.data, iris.target

lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
model = SelectFromModel(lsvc, prefit=True)
X_new = model.transform(X)

However it is not clear to me how can I get the column names. Since X_new is numpy array, I tried this:

X_new.dtype.names

But it returns nothing. So, how can I actually understand which columns have been selected?

@Zero: In my case the number of features is not equal in the original X and X_new. I saw this solution, but I do not understand what is "feature_selector.get_support": `X_selected_df = pd.DataFrame(X_new, columns=[X.columns[i] for i in range(len(X.columns)) if feature_selector.get_support()[i]])` — ScalaBoy, Sep 03 '18 at 12:06
This is what I tried: `X_selected_df = pd.DataFrame(X_new, columns=[X.columns[i] for i in range(len(X.columns)) if SelectFromModel.get_support()[i]])`, but got `AttributeError: 'numpy.ndarray' object has no attribute 'columns'`. — ScalaBoy, Sep 03 '18 at 12:08
@LiamHealy: Ok, now I have this error: `TypeError: get_support() missing 1 required positional argument: 'self'` — ScalaBoy, Sep 03 '18 at 12:21
I solved it. I had to use `model.get_support` instead of `SelectFromModel.get_support`. — ScalaBoy, Sep 03 '18 at 12:24

score 0 · Answer 1 · answered Sep 03 '18 at 12:27

Once you have converted your data into a csv file, you will want to use pd.read_csv to get that file into a dataframe.

You can then use the columns attribute to access the columns.

Furthermore, you could use the to_list attribute to get the columns as a list.

Alternatively, you could use Ahmad's method:

import re

f = open('f.csv','r')

alllines = f.readlines()
columns = re.sub(' +',' ',alllines[0]) #delete extra space in one line
columns = columns.strip().split(',') #split using space

print(columns)

EDIT: The question was solved by the OP through using model.get_support instead of SelectFromModel.get_support

How to get column names from my numpy array?

1 Answers1