Highest Voted 'feature-selection' Questions

27

votes

3 answers

How to use scikit-learn PCA for features reduction and know which features are discarded

I am trying to run a PCA on a matrix of dimensions m x n where m is the number of features and n the number of samples. Suppose I want to preserve the nf features with the maximum variance. With scikit-learn I am able to do it in this way: from…

asked Apr 25 '14 at 13:30

gc5

9,468
24
90
151

25

votes

3 answers

Put customized functions in Sklearn pipeline

In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC (Support Vector Classifier) The main parameters to…

machine-learning scikit-learn pipeline cross-validation feature-selection

asked Jul 07 '15 at 04:44

Francis

6,416
5
24
32

24

votes

2 answers

How to handle date variable in machine learning data pre-processing

I have a data-set that contains among other variables the time-stamp of the transaction in the format 26-09-2017 15:29:32. I need to find possible correlations and predictions of the sales (lets say in logistic regression). My questions are: How to…

python r machine-learning logistic-regression feature-selection

asked Sep 26 '17 at 14:15

yppdgr

350
1
2
7

22

votes

4 answers

Difference between PCA (Principal Component Analysis) and Feature Selection

What is the difference between Principal Component Analysis (PCA) and Feature Selection in Machine Learning? Is PCA a means of feature selection?

machine-learning pca feature-selection

asked Apr 27 '13 at 07:41

AbhinavChoudhury

1,167
1
18
38

20

votes

3 answers

Plot feature importance with xgboost

When I plot the feature importance, I get this messy plot. I have more than 7000 variables. I understand the built-in function only selects the most important, although the final graph is unreadable. This is the complete code: import numpy as…

python matplotlib machine-learning xgboost feature-selection

asked Aug 18 '18 at 05:22

rnv86

790
4
10
22

20

votes

6 answers

Retain feature names after Scikit Feature Selection

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaining features. The following code: def…

python pandas scikit-learn output feature-selection

asked Oct 02 '16 at 00:56

Zakery Alexander Fyke

564
1
6
21

18

votes

4 answers

Difference between varImp (caret) and importance (randomForest) for Random Forest

I do not understand which is the difference between varImp function (caret package) and importance function (randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found…

r random-forest r-caret feature-selection

asked Jun 17 '16 at 18:59

Rafa OR

339
2
3
8

17

votes

1 answer

apache spark MLLib: how to build labeled points for string features?

I am trying to build a NaiveBayes classifier with Spark's MLLib which takes as input a set of documents. I'd like to put some things as features (i.e. authors, explicit tags, implicit keywords, category), but looking at the documentation it seems…

java apache-spark machine-learning apache-spark-mllib feature-selection

asked Dec 06 '14 at 18:01

riffraff

2,429
1
23
32

17

votes

4 answers

Recursive feature elimination on Random Forest using scikit-learn

I'm trying to preform recursive feature elimination using scikit-learn and a random forest classifier, with OOB ROC as the method of scoring each subset created during the recursive process. However, when I try to use the RFECV method, I get an…

python pandas scikit-learn random-forest feature-selection

asked Jun 09 '14 at 15:26

Bryan

5,999
9
29
50

16

votes

3 answers

sklearn logistic regression - important features

I'm pretty sure it's been asked before, but I'm unable to find an answer Running Logistic Regression using sklearn on python, I'm able to transform my dataset to its most important features using the Transform method classf =…

python scikit-learn feature-selection

asked Jun 17 '14 at 04:28

mel

161
1
1
3

15

votes

3 answers

Feature importances - Bagging, scikit-learn

For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and interpret them I use the feature importance , though for the bagging decision…

machine-learning scikit-learn decision-tree feature-selection

asked Jun 02 '17 at 16:29

Kornee

153
1
1
5

15

votes

1 answer

find important features for classification

I'm trying to classify some EEG data using a logistic regression model (this seems to give the best classification of my data). The data I have is from a multichannel EEG setup so in essence I have a matrix of 63 x 116 x 50 (that is channels x time…

scikit-learn feature-selection

asked Apr 03 '13 at 19:26

Mads Jensen

663
2
6
13

14

votes

1 answer

Perform Chi-2 feature selection on TF and TF*IDF vectors

I'm experimenting with Chi-2 feature selection for some text classification tasks. I understand that Chi-2 test checks the dependencies B/T two categorical variables, so if we perform Chi-2 feature selection for a binary text classification problem…

machine-learning scikit-learn feature-selection

asked Jan 28 '13 at 23:15

Moses Xu

2,140
4
24
35

13

votes

1 answer

Sklearn Chi2 For Feature Selection

I'm learning about chi2 for feature selection and came across code like this However, my understanding of chi2 was that higher scores mean that the feature is more independent (and therefore less useful to the model) and so we would be interested in…

python machine-learning scikit-learn feature-selection chi-squared

asked Aug 05 '18 at 15:39

RSHAP

2,337
3
28
39

13

votes

2 answers

Python's implementation of Mutual Information

I am having some issues implementing the Mutual Information Function that Python's machine learning libraries provide, in particular : sklearn.metrics.mutual_info_score(labels_true, labels_pred,…

python machine-learning feature-selection

asked Jul 10 '14 at 21:11

and_apo

1,217
3
17
41

Questions tagged [feature-selection]