Questions tagged [feature-selection]

In machine learning, this is the process of selecting a subset of most relevant features to construction your data model.

Feature selection is an important step to remove irrelevant or redundant features from our data. For more details, see Wikipedia.

1533 questions
-1
votes
1 answer

training set features different from test set features in prediction

I have two data set, train data set and test data set and I want to predict on these data sets. My train data set has these features: ID, name, age, Time of recruitment, Time fired, status My test data set has these features: ID, name, age, Time…
-1
votes
1 answer

Dimension Reduction for Clustering in R (PCA and other methods)

Let me preface this: I have looked extensively on this matter and I've found several intriguing possibilities to look into (such as this and this). I've also looked into principal component analysis and I've seen some sources that claim it's a poor…
BlueRhapsody
  • 93
  • 2
  • 13
-1
votes
1 answer

how to convert mix of text and numerical data to feature data in apache spark

I have a CSV of both textual and numerical data. I need to convert it to feature vector data in Spark (Double values). Is there any way to do that ? I see some e.g where each keyword is mapped to some double value and use this to convert. However if…
-1
votes
1 answer

Select important features then impute or first impute then select important features?

I have a dataset with lots of features (mostly categorical features(Yes/No)) and lots of missing values. One of the techniques for dimensionality reduction is to generate a large and carefully constructed set of trees against a target attribute and…
Karup
  • 2,024
  • 3
  • 22
  • 48
-1
votes
1 answer

Getting features importance with RandomClassifier Scikit

I try to get the importance weights of every feature from my dataframe. I use this code from scikit documentation: names=['Class label', 'Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 'Magnesium', 'Total phenols', 'Flavanoids', 'Nonflavanoid…
-1
votes
1 answer

Choosing Attributes for Data Mining Algorithm

I currently need to do risk analysis data mining on a dataset. This dataset has around 120 attributes. Although I can use common sense, is there any systematic methodology to do data reduction that it can guide us to choose which attributes are…
Dino
  • 781
  • 3
  • 14
  • 32
-1
votes
1 answer

Selecting samples for supervised machine learning

How does one select a sample size and sample set (for training and testing) for a binary classification problem to be solved by applying supervised learning? The current implementation is based on 15 binary features which we may expand to 20 or…
-1
votes
1 answer

What do the features given by a feature selection method mean in a binary classifier which has a cross validation accuracy of 0?

So I know that given a binary classifier, the farther away you are from an accuracy of 0.5 the better your classifier is. (I.e. A binary classifier that gets everything wrong can be converted to one which gets everything right by always inverting…
ABC
  • 1,387
  • 3
  • 17
  • 28
-2
votes
0 answers

'ps_calc_01', How does the XGBClassifier predict and calculate accuracy?

model = XGBClassifier() model.fit(X_train[['ps_calc_01']], y_train) y_pred = model.predict(X_test[['ps_calc_01']]) accuracy = accuracy_score(y_test, y_pred) print("Accuracy: %.2f%%" % (accuracy * 100.0)) I'm seeing that ('ps_calc_01') is used for…
-2
votes
1 answer

How to use colors as features in machine learning?

I have colors in RGB form. There are 4 columns 'accent_color'-> (0.6901960784313725, 0.14901960784313725, 0.10588235294117647) 'dominant_colors' -> [(0.6470588235294118, 0.16470588235294117, 0.16470588235294117), (0.0, 0.0, 0.0), (1.0, 1.0,…
-2
votes
1 answer

how to remove irrelevant features in document classification from Weka?

in Weka, text classification have a lot of features after applying feature selection how to remove irrelevant features in process tab quickly not one by one since in text classification the number of feature is high and it needs time to remove one…
-2
votes
1 answer

I face this error;; AttributeError: module 'numpy' has no attribute 'corroef'

I try to use correlation to extract features , but I faced this problem: please help me, how I can fix it? AttributeError: module 'numpy' has no attribute 'corroef' This my code to correlate the features:: cor_list = [] feature_name =…
-2
votes
1 answer

Which features to drop during feature selection

During feature selection (after doing extensive feature engineering), is there any set of rules that govern which features to drop and which to keep ? I know that highly correlated features should be dropped or merged into newer features, however I…
SOURIN ROY
  • 21
  • 1
  • 5
-2
votes
1 answer

What is "neg_mean_absolute_error" and where can I find it?

I am new to machine learning. I am trying to learn feature selection from this link. Here they have a line of code which is given below search = GridSearchCV(pipeline, grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=cv) But whenever I try to…
-2
votes
1 answer

Feature selection methodology to reduce Overfit in classification model

My dataset has over 200 variables and I am running a classification model on it, which is leading to a model OverFit. Which suggested for reducing the number of features? I started with Feature Importance, however due to such a large number of…
1 2 3
99
100