I'm trying to do some basic multi-label classification in Azure ML. I have some basic data in the following format:
value_x value_y label
x1 y1 label1
x2 y2 label1
x3 y3 label2
.....
My problem is that in my data certain labels (out of a total of five) are overrepresented, as about 40% of the data is label1, about 20% is label 2 and the rest around 10%.
I would like to get a sampling out of these to train my model, so that each label is represented in equal amounts.
Tried the stratification option in the Sampling module on the labels column, but that just gives me a sampling with the same distribution of labels as in the initial dataset.
Any idea how I could do this with a module?