1

I'm trying to implement SMOTE technic and thus I would like to visualize my labels. This technic helps me solve the data imbalance problem.

I have, say, 1000 labels of 1, 2, and 3 and the counts are 100, 100, 800 respectively. After SMOTE, I will add synthetic values so that I will have 800, 800, and 800 for 1, 2, and 3 labels.

Now I want to plot these labels before and after SMOTE. I have divided my dataset into train_X and train_y. I tried to adjust iris code to my dataset but I failed. This is what I tried (does not draw anything and giving ValueError: 'c' argument has 154 elements, which is not acceptable for use with 'x' with size 25, 'y' with size 25. error):

from matplotlib import pyplot as plt

features = train_X
target = train_y.values.ravel()

plt.scatter(features[0], features[1], alpha=0.2,
            s=100*features[3], c=target, cmap='viridis')
plt.xlabel("L1")
plt.ylabel("L2");
plt.show()

How can I plot my labels with different colors so that I can take a look how the values are distributed?

iso_9001_
  • 2,655
  • 6
  • 31
  • 47
  • Please read [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – Sheldore Mar 14 '19 at 12:50
  • @Bazingaa Ofc, but what's the problem with this question? – iso_9001_ Mar 14 '19 at 13:04
  • We cannot run this code. The error is clear that the size of x y and color parameter is different – Sheldore Mar 14 '19 at 13:56
  • Thanks for the feedback, I'll be more careful. But I'm pretty sure my parameters have the same sizes. `features` has shape of `(154, 25)` and `target` has shape of`(154, )` – iso_9001_ Mar 14 '19 at 14:04
  • 1
    If python complains about it, then they cannot be same size. Try printing their shape in the line before the error generates – Sheldore Mar 14 '19 at 14:34
  • If `np.shape(features) == (154, 25)` then `features[0]` will have length 25.... – Jody Klymak Mar 15 '19 at 04:54

0 Answers0