Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
6
votes
1 answer

How to get attribute list from fitted model in Scikit-learn?

Is there any way to get a list of features (attributes) from used model in Scikit-learn (or whole table of used training data)? I am using some preprocessing like feature selection and I would like to know features that were selected and features…
6
votes
0 answers

How to apply filter based feature selection for logistic regression in R's caret package?

I am trying to apply filter based feature selection in caret package for logistic regression. I was successful at using sbf() function for random forest and LDA models (using rfSBF and ldaSBF respectively). The way I modified lmSBF is as follows: #…
exAres
  • 4,806
  • 16
  • 53
  • 95
6
votes
2 answers

python: How to get real feature name from feature_importances

I am using Python's sklearn random forest (ensemble.RandomForestClassifier) to do classification and am using feature_importances_ to find significant feature for the classifier. Now my code is: for trip in database: …
gladys0313
  • 2,569
  • 6
  • 27
  • 51
6
votes
2 answers

feature selection in wrapper method and information filtering?

I see one example in old-mid exam from well-known person Tom Mitchell, as follows: Consider learning a classifier in a situation with 1000 features total. 50 of them are truly informative about class. Another 50 features are direct copies of the…
6
votes
1 answer

Vowpal Wabbit ignore linear terms, only keep interaction terms

Hi have a Vowpal Wabbit file with two namespaces, for example: 1.0 |A snow |B ski:10 0.0 |A snow |B walk:10 1.0 |A clear |B walk:10 0.0 |A clear |B walk:5 1.0 |A clear |B walk:100 1.0 |A clear |B walk:15 Using -q AB, I can get the interaction…
6
votes
1 answer

How can sklearn select categorical features based on feature selection

My question is i want to run feature selection on the data with several categorical variables. I have used get_dummies in pandas to generate all the sparse matrix for these categorical variables. My question is how sklearn knows that one specific…
MYjx
  • 4,157
  • 9
  • 38
  • 53
6
votes
1 answer

Visual Studio 2013 Optional Features to Install

I am installing visual studio 2013 professional edition on my development box and have question on what features need to install .. I am going to develop a MVC or Web Forms Web Application which communicates to SQL Server 2005 I want to install…
msbyuva
  • 3,467
  • 13
  • 63
  • 87
6
votes
1 answer

How to rank features by their importance in a Weka classifier?

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I…
6
votes
2 answers

Using Bhattacharyya Distance for feature selection

I have a set of 240 features extracted using Image Processing. The objective is to classify test cases into 7 different classes after training. For each class there are about 60 observations(viz, I have around 60 feature vectors for each class with…
Sohaib
  • 4,556
  • 8
  • 40
  • 68
6
votes
1 answer

glmulti Oversized candidate set

Error message: SYSTEM: win7/64bit/ultimate/16gb-real-ram plus virtual memory, memory.limit(32000) What does this error message mean? In glmulti(y = "y", data = mydf, xr = c("x1", : !Oversized candidate set. mydf has 3.6mm rows & 150…
Yu Le
  • 233
  • 1
  • 4
  • 8
5
votes
1 answer

Feature Importance with SVR

I would like to plot Feature Importance with SVR, but I don't know if possible with support vector regression it's my code. from sklearn.svm import SVR C=1e3 svr_lin = SVR(kernel="linear", C=C) y_lin = svr_lin.fit(X,Y).predict(X) scores =…
Juan Carlos
  • 177
  • 1
  • 10
5
votes
2 answers

What we should do with highly correlated features?

In my data set 2 features C1 and C2 are highly correlated. I did following steps. Could you please let me know if it is correct and make sense? do you have a better approach? First I used linear model to find the fitted line: C1=a*C2+b from sklearn…
user12904074
5
votes
2 answers

Pandas dataframe divide features to group of high correlation

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to groups, such that each group will be a "red zone", meaning each group will have features…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
5
votes
1 answer

How to select best features in Dataframe using the Information Gain measure in scikit-learn

I want to identify the 10 best features of a Dataframe using the Information Gain measure (Mutual Info in scikit-learn) and display them in a table (in ascending order according to the score obtained by the Information Gain). In this example,…
Lynn
  • 121
  • 8
  • 25
5
votes
3 answers

Correlation coefficient explanation--Feature Selection

How to determine the variables to be removed from our model based on the Correlation coefficient . See below Example of variables: Top 10 Absolute Correlations: Variable 1 Variable 2 Correlation Value pdays pmonths …
Hell Boy
  • 971
  • 2
  • 12
  • 28