I have the following df:
df = pd.DataFrame([
['A', 'X', '2020-10-01', 1],
['A', 'X', '2020-10-02', 2],
['A', 'X', '2020-10-03', 3],
['A', 'Y', '2020-10-01', 4],
['A', 'Y', '2020-10-02', 5],
['A', 'Y', '2020-10-03', 6],
['B', 'Z', '2020-10-01', 7],
['B', 'Z', '2020-10-02', 8],
['B', 'Z', '2020-10-03', 9],
['B', 'Z', '2020-10-01', 10],
['B', 'Z', '2020-10-02', 11],
['B', 'Z', '2020-10-03', 12],
],
columns=['Q', 'W', 'DT', 'V']
)
I would like to create a scatter plot:
fig, ax = plt.subplots(figsize=(12, 8), frameon=False)
fig.suptitle('Plotz', fontsize=16)
ax.set_title('DF Plot')
ax.scatter(x=df.DT, y=df.W, s=df.V)
This created the following chart:
I would like to figure out what actually happens, since there are 9 datapoints on the graph while there are 12 datapoints in the data. Annotating the chart does not work, it will annotate with 2 values for the top row.
for i, txt in enumerate(df.V):
ax.annotate(txt, (df.DT[i], df.W[i]), fontsize=14)
Is there a way to figure out what really happens under the hood when there are multiple values for the x,y pair (like in this case)?
Update: Maybe I was not clear. What is the default behaviour of Matplotlib in this scenario? Is it last value wins? How could I display on the plot the actual value? (That shows the real value on the plot unlike the annotate code that shows both values).
After googling more around I think is the answer:
Visualization of scatter plots with overlapping points in matplotlib