Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
-1
votes
1 answer

TypeError: '(slice(None, None, None), array([ True, False, ... True, True]))' is an invalid key

Trying to use BPSO for feature selection but getting a Type error: TypeError: '(slice(None, None, None), array([ True, False, False, True, False, False, False, True, True, False, True, False, False, False, False, True, False, True, True, …
-1
votes
1 answer

Backward stepwise selection to choose an optimal subset of the predictors with the AUC as a criterion

I am looking to perform a backward feature selection process on a logistic regression with the AUC as a criterion. For building the logistic regression I used the scikit library, but unfortunately this library does not seem to have any methods for…
-1
votes
1 answer

Why should we use Lasso over Linear regression for feature selection in machine learning?

while selecting features in machine learning, one can use Lasso regression to figure out the least required feature by selecting the least coefficient but we can do the same using Linear Regression linear regression Y=x0+x1b1+x2b2.......xnbn here…
-1
votes
1 answer

Best way of matching the feature spaces in this classification problem using Tfidf and SVM

I am training a model to detect spam/ham emails, and feature selecting by doing: t = TfidfVectorizer(max_features=num_feature) t.fit_transform(spam_corpus) spam_features = t.get_feature_names() t.fit_transform(ham_corpus) ham_features =…
-1
votes
1 answer

Feature Selection for gene expression data

Can someone please give me some suggestions on which feature selection techniques for gene classification should I use?
-1
votes
1 answer

Checking relationship between two categorical object data types column in python

In my Pandas DataFrame there are two categorical variable one is the target which has 2 unique values & the other one is the feature which has 300 unique values now I want to check the relationship between two variables using ChiSquare test now the…
geek
  • 73
  • 2
  • 12
-1
votes
1 answer

Using cross-validation to calculate feature importance "Some Questions"

I am currently working on a project. I already selected my features and want to check their importance. I have some questions if anyone can help me please. 1- Does it make sense if I use RandomForestClassifier with cross-validation to calculate the…
-1
votes
1 answer

svm-rfe over different level of features

Let's assume that I have data with 1000 features. I want to apply SVM-RFE on this data where each time 10% for the features are removed. How one can get the accuracy overall the levels of the elimination stages. For example, I want to get…
-1
votes
2 answers

PCA features do not match original features

I am trying to reduce the feature dimensions using PCA. I have been able to apply PCA to my training data, but am struggling to understand why the reduced feature set (X_train_pca) shares no similarities with the original features…
Espresso
  • 740
  • 13
  • 32
-1
votes
1 answer

Can different summary metrics of a single feature be used as a features for k-means clustering?

I have a scenario where i wanted to understand the customers behavior pattern and group them into different segments/clusters for an e-commerce platform. I choose to un-supervised machine learning algorithm: k-means clustering to accomplish this…
-1
votes
1 answer

Feature selection using statistical model

Problem statement : I am working on a problem where i have to predict if customer will opt for loan or not.I have converted all available data types (object,int) into integer and now my data looks like below. The highlighted column is my Target…
-1
votes
1 answer

How to see correlation between features in scikit-learn?

I am developing a model in which it predicts whether the employee retains its job or leave the company. The features are as below…
Ishaan
  • 1,249
  • 15
  • 26
-1
votes
1 answer

transform() takes 2 positional arguments but 3 were given

Following is my code:- from sklearn.feature_selection import SelectKBest, chi2, f_regression X_train_new = SelectKBest(score_func=chi2,k=2000).fit_transform(X_train_2,y_train) X_cv_new =…
-1
votes
1 answer

What dimension reduction techniques can i try on my data (0-1 features+tfidf scores as features) before feeding it into svm

I have about 8000 features measuring a two level response variable i.e. output can belong to class 1 or 0. The 8000 features consist of about 3000 features with 0-1 values and about 5000 features (which are basically words from text data and their…
-1
votes
2 answers

How to select features for clustering?

I had time-series data, which I have aggregated into 3 weeks and transposed to features. Now I have features: A_week1, B_week1, C_week1, A_week2, B_week2, C_week2, and so on. Some of features are discreet, some - continuous. I am thinking of…