I have a dataset:
- almost 45K samples
- 8 features
- 4 classes
The percentage of samples for each class is different. I wanted to draw all scatter charts for each combination's pair, that's to say, 28 charts and by considering all dataset.
So at the end I get , for each chart, a scatter where I see the samples distributed by class. Since I have seen in a book,an example, where they plot these scatters by considering the same number of samples for each class.
For example: 100 samples class0, 100 samples class1, 100 sample class2, 100 samples class3.
Question: I am wondering if by considering all dataset with different percentage for each class is correct or not?
Note: I want to get a view to the figure out whether the features, taking them in pairs, are linearly separable or not.