Highest Voted 'feature-selection' Questions

7

votes

1 answer

How SelectKBest (chi2) calculates score?

I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score…

asked Jul 30 '19 at 14:09

justRandomLearner

166
2
18

7

votes

1 answer

feature names from sklearn pipeline: not fitted error

I'm working with scikit learn on a text classification experiment. Now I would like to get the names of the best performing, selected features. I tried some of the answers to similar questions, but nothing works. The last lines of code are an…

python scikit-learn feature-selection names

asked Jul 23 '17 at 16:19

Bambi

715
2
8
19

7

votes

1 answer

How is feature importance calculated for GradientBoostingClassifier

I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier. It makes feature importance score available in feature_importances_. How are these feature importances calculated? I'd like to understand what algorithm…

python machine-learning scikit-learn feature-selection

asked May 24 '17 at 16:06

D.W.

3,382
7
44
110

7

votes

2 answers

Multi-label feature selection using sklearn

I'm looking to perform feature selection with a multi-label dataset using sklearn. I want to get the final set of features across labels, which I will then use in another machine learning package. I was planning to use the method I saw here, which…

python-2.7 machine-learning scikit-learn feature-selection multilabel-classification

asked May 04 '16 at 20:30

Taylor

151
1
2
9

7

votes

2 answers

How can I get the relative importance of features of a logistic regression for a particular prediction?

I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and…

machine-learning scikit-learn logistic-regression feature-selection coefficients

asked Dec 30 '15 at 12:23

sapo_cosmico

6,274
12
45
58

7

votes

3 answers

How does sklearn random forest index feature_importances_

I have used the RandomForestClassifier in sklearn for determining the important features in my dataset. How am I able to return the actual feature names (my variables are labeled x1, x2, x3, etc.) rather than their relative name (it tells me the…

python scikit-learn random-forest feature-selection

asked Mar 12 '14 at 19:22

Jason Wolosonovich

494
1
10
13

7

votes

2 answers

Normalizing feature values for SVM

I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1) Let's suppose I have 3 features with values in ranges of: 3 - 5. 0.02 - 0.05 10-15. How do I…

machine-learning range normalization svm feature-selection

asked Dec 10 '13 at 22:28

user3010273

890
5
11
18

6

votes

2 answers

sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable

I am attempting to use a pipeline to feed an ensemble voting classifier as I want the ensemble learner to use models that train on different feature sets. For this purpose, I followed the tutorial available at [1]. Following is the code that I…

python scikit-learn pipeline feature-selection ensemble-learning

asked May 29 '20 at 04:58

Chamila Wijayarathna

1,815
5
30
54

6

votes

1 answer

Getting TypeError: '(slice(None, None, None), array([0, 1, 2, 3, 4]))' is an invalid key

Trying to use BorutaPy for feature selection. but getting a TypeError: '(slice(None, None, None), array([0, 1, 2, 3, 4]))' is an invalid key. from sklearn.ensemble import RandomForestClassifier from boruta import BorutaPy rf =…

python machine-learning feature-extraction feature-selection

asked Nov 24 '19 at 08:37

Michel Das

61
1
2

6

votes

1 answer

Sentiment analysis Pipeline, problem getting the correct feature names when feature selection is used

In the following example I use a twitter dataset to perform sentiment analysis. I use sklearn pipeline to perform a sequence of transformations, add features and add a classifer. The final step is to visualise the words that have the higher…

python scikit-learn pipeline tf-idf feature-selection

asked Jul 05 '19 at 10:09

Stamatis Tiniakos

698
1
11
33

6

votes

2 answers

How to handle One-Hot Encoding in production environment when number of features in Training and Test are different?

While doing certain experiments, we usually train on 70% and test on 33%. But, what happens when your model is in production? The following may occur: Training Set: ----------------------- | Ser |Type Of Car | ----------------------- | 1 |…

python machine-learning feature-selection one-hot-encoding

asked Jul 24 '18 at 18:24

Roshan Joe Vincent

301
3
10

6

votes

2 answers

python spark: narrowing down most relevant features using PCA

I am using spark 2.2 with python. I am using PCA from ml.feature module. I am using VectorAssembler to feed my features to PCA. To clarify, let's say I have a table with three columns col1, col2 and col3 then I am doing: from pyspark.ml.feature…

apache-spark machine-learning pyspark pca feature-selection

asked Jan 30 '18 at 16:43

Sameer Mahajan

484
1
8
27

6

votes

1 answer

Wrapper Methods for feature selection (Machine Learning) In Scikit Learn

I am trying to decide between scikit learn and the weka data mining tool for my machine learning project. However I realized the need for feature selection. I would like to know if scikit learn has wrapper methods for feature selection.

python-2.7 machine-learning feature-selection

asked Feb 25 '16 at 23:02

Sean Sog Miller

207
1
4
11

6

votes

1 answer

How to efficiently retrieve top K-similar document by cosine similarity using python?

I am handling one hundred thousand(100,000) documents(mean document length is about 500 terms). For each document, I want to get the top k (e.g. k = 5) similar documents by cosine similarity. So how to efficiently do this by Python. Here is what I…

python algorithm tf-idf feature-selection cosine-similarity

asked Dec 24 '15 at 03:44

user1024

982
4
13
26

6

votes

1 answer

how to change feature weight when training a model with sklearn?

I want to classifier text by using sklearn. first I used bag of words to training the data, the feature of bag of words are really large, more than 10000 features, so I reduced this feature by using SVD to 100. But here I want to add some other…

python scikit-learn classification feature-selection

asked Nov 28 '15 at 15:40

HAO CHEN

1,209
3
18
32

Questions tagged [feature-selection]