How do I deal with class imbalance when using Sparklyr with MLib?

Asked Dec 05 '22 at 01:25

Active Dec 05 '22 at 22:09

Viewed 46 times

I have a severe class imbalance where positive response is about 3%. The 3% absolute volume is about ~6000 rows. I'm currently using sparklyr and MLibs algorithms. Some of the native Databricks MLibs has class weight imbalance as a parameter. Is that available in sparklyr? I'm currently using ml_random_forest_classifier as the algorithm to classify a dichotomous outcome. thanks.

https://docs.databricks.com/machine-learning/automl/how-automl-works.html#imbalanced-dataset-support-for-classification-problems

Reproducible codes are here. Sparklyr Spark ML Feature Importance after feature transformation