I would like to produce a scatter plot of pandas DataFrame with categorical row and column labels using matplotlib
. A sample DataFrame looks like this:
import pandas as pd
df = pd.DataFrame({"a": [1,2], "b": [3,4]}, index=["c","d"])
# a b
#c 1 2
#d 3 4
The marker size is the function of the respective DataFrame values. So far, I came up with an awkward solution that essentially enumerates the rows and columns, plots the data, and then reconstructs the labels:
flat = df.reset_index(drop=True).T.reset_index(drop=True).T.stack().reset_index()
# level_0 level_1 0
#0 0 0 1
#1 0 1 2
#2 1 0 3
#3 1 1 4
flat.plot(kind='scatter', x='level_0', y='level_1', s=100*flat[0])
plt.xticks(range(df.shape[1]), df.columns)
plt.yticks(range(df.shape[0]), df.index)
plt.show()
Now, question: Is there a more intuitive, more integrated way to produce this scatter plot, ideally without splitting the data and the metadata?