I want to scatterplot two categorical variables as follows
from matplotlib import pyplot as plt
a=[1,1,1,1,2,2]
b=[2,2,2,2,1,1]
plt.scatter(a,b)
If I plot this I will see only two points (4 overlapping in (1,2), and 2 overlapping in (2,1)) without being able to appreciate the different occurrence of the two overlapping points.
I would like to see a scatter plot where the marker of the point of the left (1,2) is twice bigger than the marker on the point on the right (2,1), in order to show the different occurrence of the point. What is the correct way to do this? (beside the trival solution where I count occurrences by hand and I put them inside the size
argument of plt.scatter
)
I already searched other SOF questions, but they all propose to use an alpha like here, but I would like to see a marker size to appreciate better the different proportionalities between occurrences.
A pointer might be to use some Kernel Density Estimate as suggested in this answer
To give a bit more context to my question, the two output are the predictions of two classifiers, and I want to explore the differences between the predictions to evaluate whether to ensemble them.