matplotlib scatter: the more overlapping points the bigger the marker

Question

I want to scatterplot two categorical variables as follows

from matplotlib import pyplot as plt    
a=[1,1,1,1,2,2]
b=[2,2,2,2,1,1]
plt.scatter(a,b)

If I plot this I will see only two points (4 overlapping in (1,2), and 2 overlapping in (2,1)) without being able to appreciate the different occurrence of the two overlapping points.

I would like to see a scatter plot where the marker of the point of the left (1,2) is twice bigger than the marker on the point on the right (2,1), in order to show the different occurrence of the point. What is the correct way to do this? (beside the trival solution where I count occurrences by hand and I put them inside the size argument of plt.scatter)

I already searched other SOF questions, but they all propose to use an alpha like here, but I would like to see a marker size to appreciate better the different proportionalities between occurrences.

A pointer might be to use some Kernel Density Estimate as suggested in this answer

To give a bit more context to my question, the two output are the predictions of two classifiers, and I want to explore the differences between the predictions to evaluate whether to ensemble them.

Sheldore · Accepted Answer · 2019-03-12T22:24:23.723

5

You can make use of the occurrence frequency of the x-points (or even y-points for this particular data set) which can be obtained using Counter module. The frequencies can then be used as a rescaling factor for defining the size of the markers. Here 200 is just a big number to emphasize the size of the markers.

from matplotlib import pyplot as plt    
from collections import Counter

a=[1,1,1,1,2,2]
b=[2,2,2,2,1,1]

weights = [200*i for i in Counter(a).values() for j in range(i)]
plt.scatter(a, b, s = weights)
plt.show()

Another option to visualise the distribution is a bar chart

freqs = Counter(a)

plt.bar(freqs.keys(), freqs.values(), width=0.5)
plt.xticks(list(freqs.keys()))

edited Mar 12 '19 at 22:24

answered Mar 12 '19 at 22:18

Sheldore

37,862
7
57
71

yes, Counter is a good option: do you think scatter is the best plot? or there might be some better plot to highlight the different distributions of the classifiers' predictions? – Alessandro Solbiati Mar 12 '19 at 22:21
1

@AlessandroSolbiati: Bar chart is another good option which will directly show the frequency of occurence – Sheldore Mar 12 '19 at 22:22
Still useful 5 years later, +1 and thanks! – bmasri Jan 31 '23 at 09:53

matplotlib scatter: the more overlapping points the bigger the marker

1 Answers1