I'm trying to implement SMOTE
technic and thus I would like to visualize my labels. This technic helps me solve the data imbalance problem.
I have, say, 1000 labels of 1, 2, and 3
and the counts are 100, 100, 800
respectively. After SMOTE
, I will add synthetic values so that I will have 800, 800, and 800
for 1, 2, and 3
labels.
Now I want to plot these labels before and after SMOTE
. I have divided my dataset into train_X
and train_y
. I tried to adjust iris code to my dataset but I failed. This is what I tried (does not draw anything and giving ValueError: 'c' argument has 154 elements, which is not acceptable for use with 'x' with size 25, 'y' with size 25.
error):
from matplotlib import pyplot as plt
features = train_X
target = train_y.values.ravel()
plt.scatter(features[0], features[1], alpha=0.2,
s=100*features[3], c=target, cmap='viridis')
plt.xlabel("L1")
plt.ylabel("L2");
plt.show()
How can I plot my labels with different colors so that I can take a look how the values are distributed?