Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
0
votes
1 answer

feature selection

I have document-term data with terms as dimensions. I have to perform feature selection on the terms and I intend to use Mutual Information as the measure to perform feature selection. My doubt here is that after calculating the mutual information…
pooja
  • 73
  • 1
  • 6
0
votes
1 answer

sentiment analysis , feature selection

I want to know what are the appropriate tools for each step to analyse sentiment : removing stopwords, stemming, Vector Representation of Text, feature selection, classification, how to pass from Vector Representation of Text to feature selection…
Manel Ayadi
  • 139
  • 3
  • 13
0
votes
1 answer

Is it possible to use SVM to learn a training sample with an input of "Feature Matrix" rather than a "Feature Vector"?

Is it possible to use SVM to learn a training sample with an input of "Feature Matrix" rather than a "Feature Vector" ? I need to classify XML documents by representing each document as a Feature Matrix. Typically, a feature vector is used to train…
0
votes
1 answer

fetch the selected variables in 'step' method in R

I am removing unnecessary/spurious variables from my data using 'step' function. I am using the folloeing code: state.x77 st = as.data.frame(state.x77) colnames(st)[4] = "Life.Exp" # no spaces in variable names, please …
user1140126
  • 2,621
  • 7
  • 29
  • 34
0
votes
2 answers

Best Feature Selection Algorithm For Document Classification

I am working on a document classification project. I am using tf-idf and centroid algorithms. But I need a dictionary, for using that algorithms. I have tried information gain for maikng a dictionary but I think it's not satisfied enough. Have you…
Yavuz
  • 1,257
  • 1
  • 16
  • 32
0
votes
2 answers

What is feed-forward wrapper method for feature selection?

For a school project I need to choose a dataset from UCI repository and classify the data with KNN after processing it with "feed forward wrapper" feature selection. Googling for "feed forward wrapper" yields nothing... Can someone explain to me…
fgungor
  • 479
  • 4
  • 15
0
votes
1 answer

Wavelet Packet Decomposition, Feature Selection and SVM

I want to know more about a fault detection model using Wavelet Packet Decomposition, Feature Selection and SVM. One can read some related papers…
minhbsu
  • 113
  • 3
  • 10
-1
votes
0 answers

Can I use chi2 test and pearson correlation coefficient in dataset containing both numerical and categorical variables?

I have a dataset which contains both numerical and categorical variables, So can I use mentioned two techniques separately to select features? For example - A, B, C, D, E are my columns wherein A, B are categorical so here I'll use chi2 test whereas…
-1
votes
0 answers

How to combine the "gain" feature importance metric for multiple features from XGBoost?

How to combine "gain" feature importance for multiple features with XGBoost? For example, I would like to compare the importance of all features from both hands (i.e., X and Y coordinates of ten fingers, so 20 features) to the importance of all…
-1
votes
1 answer

Feature Importance cannot tell positive or negative magnitude

I am doing a data analysis/ machine learning project. The main goal is to identify which component is causing the problem in a large dataset. The dataset contain many rows, each rows represent one single test, each test contain the information such…
-1
votes
1 answer

SARIMA with multiple features and n-step ahead in sample

I have a problem with my SARIMA. I want to create a model that forecast a value a day-ahead based on multiple features. I used the following code, but I do not know how to incorporate all the other features ('data' in the code): data =…
J1999
  • 3
  • 2
-1
votes
1 answer

Remove features from dataset

Im conducting an experiment on blood test results data trying to predict the probability a patient has a curtain disease. using the blood test result i have reached over 2000 features and im trying to find a good way to eliminate features that…
Dana
  • 15
  • 3
-1
votes
1 answer

How to reduce high number of categorical variables from your dataset while training ML algorithm

I have a dataset that has a high number of categorical variables. For example, currently, the dataset has 37 categorical variables, now if I perform one hot encoding or any other encoding it will explode the number of columns and overall column…
-1
votes
1 answer

why sklearn SelectFromModel estimator_.coef_ return a 2d-array

I asked in Cross Validated before but it seems it should be proper to ask here. My data df_X has 11 features, and y is the multi-class label (3,4,5,6,7,8 in samples). I used multi-class SVM to select the importance of features. estimator_.coef_…
user6703592
  • 1,004
  • 1
  • 12
  • 27
-1
votes
1 answer

Which Feature Selection Techniques for NLP is this represent

I have a dataset that came from NLP for technical documents my dataset has 60,000 records There are 30,000 features in the dataset and the value is the number of repetitions that word/feature appeared here is a sample of the dataset RowID …
asmgx
  • 7,328
  • 15
  • 82
  • 143