Scatter plotting pandas DataFrame with categorically labeled rows/columns

Question

I would like to produce a scatter plot of pandas DataFrame with categorical row and column labels using matplotlib. A sample DataFrame looks like this:

import pandas as pd
df = pd.DataFrame({"a": [1,2], "b": [3,4]}, index=["c","d"])
#   a  b
#c  1  2
#d  3  4

The marker size is the function of the respective DataFrame values. So far, I came up with an awkward solution that essentially enumerates the rows and columns, plots the data, and then reconstructs the labels:

flat = df.reset_index(drop=True).T.reset_index(drop=True).T.stack().reset_index()
#   level_0  level_1  0
#0        0        0  1
#1        0        1  2
#2        1        0  3
#3        1        1  4

flat.plot(kind='scatter', x='level_0', y='level_1', s=100*flat[0])
plt.xticks(range(df.shape[1]), df.columns)
plt.yticks(range(df.shape[0]), df.index)
plt.show()

Which kind of works.

Now, question: Is there a more intuitive, more integrated way to produce this scatter plot, ideally without splitting the data and the metadata?

I don't think we can use non-numerical data for plotting. AFAIK you will have to set ticks separately anyway... — MaxU - stand with Ukraine, Jun 02 '17 at 14:51
I guess the question translates in *"Why has no library implemented my custom plotting wish function yet?"*. — ImportanceOfBeingErnest, Jun 04 '17 at 13:27

score 7 · Accepted Answer · answered Jun 04 '17 at 18:28

Maybe not the entire answer you're looking for, but an idea to help save time and readability with the flat= line of code.

Pandas unstack method will produce a Series with a MultiIndex.

dfu = df.unstack()

print(dfu.index)
MultiIndex(levels=[[u'a', u'b'], [u'c', u'd']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

The MultiIndex contains contains the necessary x and y points to construct the plot (in labels). Here, I assign levels and labels to more informative variable names better suited for plotting.

xlabels, ylabels = dfu.index.levels
xs, ys = dfu.index.labels

Plotting is pretty straight-forward from here.

plt.scatter(xs, ys, s=dfu*100)
plt.xticks(range(len(xlabels)), xlabels)
plt.yticks(range(len(ylabels)), ylabels)
plt.show()

I tried this on a few different DataFrame shapes and it seemed to hold up.

score 4 · Answer 2 · answered Jun 04 '17 at 12:22

4

It's not exactly what you were asking for, but it helps to visualize values in a similar way:

import seaborn as sns

sns.heatmap(df[::-1], annot=True)

Result:

answered Jun 04 '17 at 12:22

MaxU - stand with Ukraine

205,989
36
386
419

score 3 · Answer 3 · answered Jun 07 '17 at 17:26

Maybe you can use numpy array and pd.melt to create the scatter plot as shown below:

arr = np.array([[i,j] for i in range(df.shape[1]) for j in range(df.shape[0])])
plt.scatter(arr[:,0],arr[:,1],s=100*pd.melt(df)['value'],marker='o')
plt.xlabel('level_0')
plt.ylabel('level_1')
plt.xticks(range(df.shape[1]), df.columns)
plt.yticks(range(df.shape[0]), df.index)
plt.show()

Scatter plotting pandas DataFrame with categorically labeled rows/columns

3 Answers3