Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
-1
votes
1 answer

How to create a new columns in grouped DataFrame?

I have a DataFrame grouped by a categorical feature. For example, I have df df[['APP_NO', 'REPAY_METHOD', 'RESIDUAL_DEBT']] \ .groupby(['APP_NO', 'REPAY_METHOD']).agg({'RESIDUAL_DEBT' : 'sum'}) ID NUM CAT_FEAT aggr 1 123 2 …
-1
votes
1 answer

Feature importances with forests of trees

I am trying to find out the importance of my features and wanted to understand how the forest of trees works? To my understanding, it makes decision trees and the bar graphs show how much variance is explained by the feature which in turn shows the…
Pads
  • 51
  • 6
-1
votes
2 answers

Handle mismatch in number of features in Training Data and Prediction Data

I have 6 text features (say f1,f2,..,f6) available for the data on which I have trained a model. But when this model is deployed and a new data point comes, for which I have to make prediction using this model, it has only 2 features (f1, and f2).…
user5483003
-1
votes
1 answer

Is there any best practice for features selection for Machine Learning model to do click through rate prediction

For e-commerce company, how to pick up features when doing Click Through Rate prediction using logistic regression, SVM or other machine learning models. I tried gender, statistic features from goods tags, and used SVM, NN. but the result was very…
-1
votes
1 answer

Does feature importance change with number of max_features selected in a RandomForestRegressor, scikit-learn?

In one of my projects, I was trying to determine which of my 12 features are the most driving factors against a target variable using RandomForestRegressor(sklearn). RandomForest nicely gives you a list of feature importances that explains which of…
ThReSholD
  • 668
  • 10
  • 15
-1
votes
2 answers

Optimizing number of optimum features

I am training neural network using Keras. Every time I train my model, I use slightly different set of features selected using Tree-based feature selection via ExtraTreesClassifier(). After training every time, I compute the AUCROC on my validation…
-1
votes
1 answer

Extracting Features from the image manually

I am working on image classification problem. How to find out specific features from the image manually that will help to build a DNN? Consider an image of a man talking on phone while driving for classification as distracted.
-1
votes
1 answer

Ho do get best features with Spark LinearSVC model?

I am trying to use ChiSqSelector to determine the best features for a Spark 2.2 LSVCModel, thus: import org.apache.spark.ml.feature.ChiSqSelector val chiSelector = new ChiSqSelector().setNumTopFeatures(5). setFeaturesCol("features"). …
schoon
  • 2,858
  • 3
  • 46
  • 78
-1
votes
1 answer

Feature scaling and its affect on various algorithm

Despite going through lots of similar question related to this I still could not understand why some algorithm is susceptible to it while others are not. Till now I found that SVM and K-means are susceptible to feature scaling while Linear…
am10
  • 449
  • 1
  • 6
  • 17
-1
votes
2 answers

Random Forest and Python

I have a Random Forest model for a dataset with 72 features: Objective is to find the feature importances and use it for feature selection. rf = RandomForestRegressor(n_estimators=XXX) rf.fit(X, y) I am not able to get the list of predictors…
Amrita
-1
votes
1 answer

Max-min Markov blanket feature selection: R code error

I am using Max-min Markov blanket algorithm for variable selection in R from MXM package. Following is my code: library(MXM) dataset = read.table('data.txt', na.string = c("", "NA"), sep = '\t', header = FALSE) dataset = dataset[,…
MD Abid Hasan
  • 339
  • 1
  • 3
  • 15
-1
votes
1 answer

Feature selection using correlation

I'm doing feature selection to train my Machine Learning (ML) models using correlation. I trained the each model(SVM, NN,RF) with all features and did a 10-fold cross validation to obtain mean accuracy score value. Then I removed features which has…
-1
votes
1 answer

How can we do feature selection on json data?

I have large dataset in json format from which I want to extract important attributes whcih captures the most variance. I want to extract these attributes to build a search engine on the dataset with these attributes being the hash key. The main…
-1
votes
1 answer

Features (Attributes) ranking

I have a dataset with items and features (attributes). Each item has some features. Total number of features ~400 feature. I want to rank the features based on their importance. I am not looking for classification, I am looking for features…
mbayomi
  • 71
  • 1
  • 8
-1
votes
1 answer

The number of hidden nodes in autoencoder with small number of features

I have a data set which have 2 features and 10000 samples. I would like to convert(integrate) these two features into one feature, for further analysis. So I want to use feature extraction method. As the relationship between two features are not…
1 2 3
99
100