Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
8
votes
1 answer

Genetic algorithms: fitness function for feature selection algorithm

I have data set n x m where there are n observations and each observation consists of m values for m attributes. Each observation has also observed result assigned to it. m is big, too big for my task. I am trying to find a best and smallest subset…
agnieszka
  • 14,897
  • 30
  • 95
  • 113
8
votes
1 answer

How to use RFE with xgboost Booster?

I'm currently using xgb.train(...) which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBClassifier is the sklearn api into the xgboost…
pmdaly
  • 1,142
  • 2
  • 21
  • 35
8
votes
1 answer

scikit-learn feature ranking returns identical values

I'm using scikit-learn's RFECV class to perform feature selection. I'm interested in identifying the relative importance of a bunch of variables. However, scikit-learn returns the same ranking (1) for multiple variables. This can also be seen in…
pir
  • 5,513
  • 12
  • 63
  • 101
8
votes
3 answers

Feature Selection in PySpark

I am working on a machine learning model of shape 1,456,354 X 53. I wanted to do feature selection for my data set. I know how to do feature selection in python using the following code. from sklearn.feature_selection import RFECV,RFE logreg =…
8
votes
2 answers

Feature selection on a keras model

I was trying to find the best features that dominate for the output of my regression model, Following is my code. seed = 7 np.random.seed(seed) estimators = [] estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=3, …
Klaus
  • 1,641
  • 1
  • 10
  • 22
8
votes
4 answers

XGBoost plot importance has no property max_num_features

xgboost's plotting API states: xgboost.plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, **kwargs)¶ Plot…
Carlo Mazzaferro
  • 838
  • 11
  • 21
8
votes
1 answer

Most important original feature(s) of Principal Component Analysis

I'm am doing PCA and I am interested in which original features were most important. Let me illustrate this with an example: import numpy as np from sklearn.decomposition import PCA X = np.array([[1,-1, -1,-1], [1,-2, -1,-1], [1,-3, -2,-1], [1,1,…
Guido
  • 6,182
  • 1
  • 29
  • 50
8
votes
3 answers

Named entity recognition (NER) features

I'm new to Named Entity Recognition and I'm having some trouble understanding what/how features are used for this task. Some papers I've read so far mention features used, but don't really explain them, for example in Introduction to the…
8
votes
1 answer

Sklearn Univariate Selection: Features are Constant

I am getting the following warning message when trying to use Feature Selection and f_classif (ANOVA test) on some data in sklearn: C:\Users\Alexander\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py:113: UserWarning:…
Alex
  • 3,946
  • 11
  • 38
  • 66
8
votes
1 answer

Scikit-learn zip argument #1 must support iteration

I have the following pipeline to perform machine learning on a corpus. It first extracts text, uses TfidfVectorizer to extract n-grams and then selects the best features. The pipeline is working fine without the feature selection step. However, with…
Justin D.
  • 4,946
  • 5
  • 36
  • 69
8
votes
1 answer

scikit-learn: get selected features when using SelectKBest within pipeline

I am trying to do features selection as a part of the a scikit-learn pipeline, on a multi-label scenario. My purpose is to select best K features, for some given k. It might be simple, but I don't understand how to get the selected features indices…
8
votes
1 answer

Combining Recursive Feature Elimination and Grid Search in scikit-learn

I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would…
Mark Conway
  • 106
  • 1
  • 7
8
votes
3 answers

Supervised Learning on Coding Style - Feature Selection (Scikit Learn)

I am researching whether or not it is possible to automate the scoring of student's code based off of coding style. This includes things like avoiding duplicate code, commented out code, bad naming of variables and more. We are trying to learn…
8
votes
1 answer

Example for svm feature selection in R

I'm trying to apply feature selection (e.g. recursive feature selection) in SVM, using the R package. I've installed Weka which supports feature selection in LibSVM but I haven't found any example for the syntax of SVM or anything similar. A short…
Ofer Rahat
  • 790
  • 1
  • 9
  • 15
8
votes
3 answers

Matlab: Kmeans gives different results each time

I running kmeans in matlab on a 400x1000 matrix and for some reason whenever I run the algorithm I get different results. Below is a code example: [idx, ~, ~, ~] = kmeans(factor_matrix, 10, 'dist','sqeuclidean','replicates',20); For some reason,…
user1129988
  • 1,516
  • 4
  • 19
  • 32