2

I have several pandas dataframes. I want to plot several columns against one another in separate scatter plots, and combine them as subplots in a figure. I want to label each subplot accordingly. I had a lot of trouble with getting subplot labels working, until I discovered that there are two ways of plotting directly from dataframes, as far as I know; see SO and pandasdoc:

ax0 = plt.scatter(df.column0, df.column5)
type(ax0): matplotlib.collections.PathCollection

and

ax1 = df.plot(0,5,kind='scatter')
type(ax1): matplotlib.axes._subplots.AxesSubplot

ax.set_title('title') works on ax1 but not on ax0, which returns AttributeError: 'PathCollection' object has no attribute 'set_title'

I don't understand why the two separate ways exist. What is the purpose of the first method using PathCollections? The second one was added in 17.0; is the first one obsolete or has it a different purpose?

Community
  • 1
  • 1
  • 1
    Since this may be useful to anyone visiting the question, I have found `df.plot(0,5,style='.')` to work better than `df.plot(0,5,kind='scatter')` since the former will work after using groupby() whereas the latter does not – alexbhandari Mar 13 '18 at 22:33

2 Answers2

2

As you have found, the pandas function returns an axes object. The PathCollection object can be interpreted as an axes object as well using the "get current axes" function. For instance:

plot = plt.scatter(df.column0, df.column5)
ax0 = plt.gca()
type(ax0)

< matplotlib.axes._subplots.AxesSubplot at 0x10d2cde10>

A more standard way you might see this is the following:

fig = plt.figure()
ax0 = plt.add_subplot()
ax0.scatter(df.column0, df.column5)

At this point you are welcome to do "set" commands such as your set_title.

Hope this helps.

Alex
  • 12,078
  • 6
  • 64
  • 74
1

The difference between the two is that they are from different libraries. The first one is from matplotlib, the second one from pandas. They do the same, which is create a matplotlib scatter plot, but the matplotlib version returns a collection of points, whereas the pandas version returns a matplotlib subplot. This makes the matplotlib version a bit more versatile, as you can use the collection of points in another plot.

Jesse Bakker
  • 2,403
  • 13
  • 25