1

I am new to data science & machine learning, so I'll write my question in detail.

I have an imbalanced dataset (binary classification dataset), and I want to apply these methods by using Weka paltform:

  1. 10-Fold cross validation.
  2. Oversampling to balance the data.
  3. A Wrapper feature selection method.
  4. 6 classifiers and compare between their performance.

I want to apply them under these conditions:

  1. Balancing the data before applying a feature selection method (reference).
  2. Balancing the data during cross validation (reference).

What is the correct procedure?

I've written a post below with a suggested procedure.

Muneera
  • 11
  • 2
  • Maybe your question is outside of the scope of this community becuase this community is more related to the code itself. I think [crossValidated](https://stats.stackexchange.com/) is more suited for this or maybe [Artificial intelligence](https://ai.stackexchange.com/). – StandardIO Feb 03 '23 at 06:15
  • Thank you very much. I am really sorry. I'll post my question in crossValidated – Muneera Feb 03 '23 at 07:03

1 Answers1

0

Is this procedure correct?

Firstly, using a feature selection method to reduce the number of features:

  1. From Preprocess tab: Balancing the entire dataset.
  2. From Select atributes tab: Applying a feature selection method to the balanced dataset.
  3. From Preprocess tab: Removing the unselected attributes (resulting from step #2) from the original imbalanced dataset and saving the new copy of the dataset in order to use it in the following.

Then, applying coss validation and balancing methods to the new copy of the dataset:

  1. From Classify tab: Chosing the 10-fold cross validation.
  2. Chosing FilterClassifier and editting its properties:
  • classifier: selecting the classifier (one by one).
  • filter: Resampling.
Muneera
  • 11
  • 2