I am working in a multi-label prediction task where the label is encoded as one-hot encoded vector such as [1, 0, 0]
or [0, 1, 0]
or [0, 0, 1]
of type ndarray
.
The dataset is imbalanced. Hence, I am using SMOTE. This works and upsamples all minority classes (it upsamples as many records as the majority class holds).
Now, I want to upsample not as many records. According to the documentation, I can use sampling_strategy
and provide a dict with key = class label
and value = total records
.
However, I cannot add the ndarray
as key to my dict (TypeError: unhashable type: 'numpy.ndarray'
). What is the best way here? SMOTE can obviously handle these one-hot encoded vectors -- so how do I get the total records
in there?