1

I am referring Google's machine learning DataPrep course, in this lecture https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data about solving class imbalanced problem, the technique mentioned is to first downsample and then upweight. This lecture talks about the theory but I couldn't find its practical implementation. Can someone guide?

Fariha Abbasi
  • 1,212
  • 2
  • 13
  • 19

1 Answers1

1

Upweighting is done to calibrate the probablities provided by probabilistic classifiers so that the output of the predict_proba method can be directly interpreted as a confidence level.

Python implementation of the two calibration methods is provided here - https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration.html#sphx-glr-auto-examples-calibration-plot-calibration-py

More details about probablity calibration is provided here - https://scikit-learn.org/stable/modules/calibration.html

  • this will solve the problem of class imbalance, right? You can also check my code specific question here: https://stackoverflow.com/questions/57375168/how-to-apply-class-weights-in-linear-classifier-for-binary-classification – Fariha Abbasi Aug 06 '19 at 11:49
  • No above code won't solve class imbalance problem. Upweighting is done for correcting the probabilities which is done after downsampling. – Hitesh Laddha Aug 07 '19 at 04:41