I am trying to solve a binary classification problem with a class imbalance. I have a dataset of 210,000 records in which 92 % are 0s and 8% are 1s. I am using sklearn (v 0.16)
in python
for random forests
.
I see there are two parameters sample_weight
and class_weight
while constructing the classifier. I am currently using the parameter class_weight="auto"
.
Am I using this correctly? What does class_weight and sample weight actually do and What should I be using ?