Highest Voted 'feature-selection' Questions

13

votes

2 answers

Recursive feature elimination and grid search using scikit-learn

I would like to perform recursive feature elimination with nested grid search and cross-validation for each feature subset using scikit-learn. From the RFECV documentation it sounds like this type of operation is supported using the estimator_params…

scikit-learn feature-selection

asked May 22 '14 at 19:53

DavidS

2,344
1
17
18

13

votes

1 answer

What to do first: Feature Selection or Model Parameters Setting?

This is more of a theoretical question. I'm working with the scikit-learn package to perform some NLP task. Sklearn provides many methods to perform both feature selection and setting of a model parameters. I'm wondering what I should do first. If I…

machine-learning scikit-learn feature-selection

asked Sep 17 '12 at 15:24

feralvam

1,603
2
17
20

12

votes

5 answers

Get a feature importance from SHAP Values

iw ould like to get a dataframe of important features. With the code below i have got the shap_values and i am not sure, what do the values mean. In my df are 142 features and 67 experiments, but got an array with ca. 2500 values. explainer =…

python random-forest feature-selection

asked Jan 01 '21 at 22:15

Parsyk

321
1
3
11

12

votes

3 answers

All intermediate steps should be transformers and implement fit and transform

I am implementing a pipeline using important features selection and then using the same features to train my random forest classifier. Following is my code. m = ExtraTreesClassifier(n_estimators = 10) m.fit(train_cv_x,train_cv_y) sel =…

python machine-learning scikit-learn feature-selection

asked Feb 13 '18 at 01:59

Stupid420

1,347
3
19
44

11

votes

2 answers

SciKit-Learn Label Encoder resulting in error 'argument must be a string or number'

I'm a bit confused - creating an ML model here. I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correlation between the features and select the "best"…

python machine-learning scikit-learn feature-selection one-hot-encoding

asked Nov 14 '19 at 23:47

mikelowry

1,307
4
21
43

11

votes

2 answers

Dealing with datasets with repeated multivalued features

We have a Dataset that is in sparse representation and has 25 features and 1 binary label. For example, a line of dataset is: Label: 0 exid: 24924687 Features: 11:0 12:1 13:0 14:6 15:0 17:2 17:2 17:2 17:2 17:2 17:2 21:11 21:42 21:42 21:42 21:42…

python scipy feature-selection multivalue-database

asked Jul 13 '19 at 19:55

Mo-

790
2
10
23

11

votes

3 answers

Best practice for holding huge lists of data in Java

I'm writing a small system in Java in which i extract n-gram feature from text files and later need to perform Feature Selection process in order to select the most discriminators features. The Feature Extraction process for a single file return a…

java data-structures feature-extraction feature-selection computation

asked Jan 14 '15 at 13:17

Aviadjo

635
5
17
36

11

votes

2 answers

Choosing Features to identify Twitter Questions as "Useful"

I collect a bunch of questions from Twitter's stream by using a regular expression to pick out any tweet that contains a text that starts with a question type: who, what, when, where etc and ends with a question mark. As such, I end up getting…

machine-learning classification nltk feature-selection

asked Jan 14 '13 at 03:00

bili

610
2
9
20

10

votes

2 answers

Logistic Regression: How to find top three feature that have highest weights?

I am working on UCI breast cancer dataset and trying to find the top 3 features that have highest weights. I was able to find the weight of all features using logmodel.coef_ but how can I get the feature names? Below is my code, output and dataset…

python machine-learning scikit-learn logistic-regression feature-selection

asked Apr 23 '17 at 21:20

jubins

317
2
7
18

10

votes

3 answers

Fast Information Gain computation

I need to compute Information Gain scores for >100k features in >10k documents for text classification. Code below works fine but for the full dataset is very slow - takes more than an hour on a laptop. Dataset is 20newsgroup and I am using…

python performance machine-learning scikit-learn feature-selection

asked Aug 23 '14 at 13:24

p_b_garcia

101
1
1
5

9

votes

2 answers

Interpreting logistic regression feature coefficient values in sklearn

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about the phenomenon tested (binary response in…

python scikit-learn logistic-regression feature-selection coefficients

asked Jun 24 '18 at 01:07

Jane Sully

3,137
10
48
87

9

votes

2 answers

Attribute's predictive capacity for a particular target in Python, using feature selection in Sklearn

Are there any feature selection methods in Scikit-Learn (or algos in general) that give weights of an attribute's ability/predictive-capacity/importance to predict a specific target? For example, the from sklearn.datasets import load_iris, ranking…

python machine-learning scikit-learn classification feature-selection

asked Nov 23 '16 at 21:19

O.rka

29,847
68
194
309

9

votes

1 answer

What does get_fscore() of an xgboost ML model do?

Does anybody how the numbers are calculated? In the documentation it says that this function "Get feature importance of each feature", but there is no explanation on how to interpret the results.

python feature-selection xgboost

asked Nov 11 '15 at 14:03

Peter Lenaers

419
3
8
17

9

votes

3 answers

python feature selection in pipeline: how determine feature names?

i used pipeline and grid_search to select the best parameters and then used these parameters to fit the best pipeline ('best_pipe'). However since the feature_selection (SelectKBest) is in the pipeline there has been no fit applied to SelectKBest. I…

scikit-learn pipeline feature-selection

asked Oct 27 '15 at 18:42

figgy

595
2
5
11

9

votes

0 answers

Meaning of GridSearchCV with RFECV in sklearn

Based on Recursive feature elimination and grid search using scikit-learn, I know that RFECV can be combined with GridSearchCV to obtain better parameter setting for the model like linear SVM. As said in the answer, there are two ways: "Run…

scikit-learn cross-validation feature-selection

asked Apr 15 '15 at 04:16

Francis

6,416
5
24
32

Questions tagged [feature-selection]