groupby multiple values, and plotting results

Question

I'm using some data on fungicide usage which has the Year, Fungicide, Amount used, along with some irrelevant columns in a panda DataFrame. It looks somewhat like:

Year, State,      Fungicide, Value
2011, California, A,         12879
2011, California, B,         29572
2011, Florida,    A,         8645
2011, Florida,    B,         19573
2009, California, A,         8764
2009, California, B,         98643,
...

What I want from it is a single plot of total fungicide used over time, with a line plotted for each individual fungicide (in a different colour). I've used .groupby to get the total amount of each fungicide used each year:

apple_fplot = df.groupby(['Year','Fungicide'])['Value'].sum()

This gives me the values I want to plot, something like:

Year, Fungicide, Value
...
2009, A,        128635
      B,        104765
2011, A,        154829
      B,        129865

Now I need to plot it so that each fungicide (A, B, ...) is a separate line on a single plot of Value over Time

Is there a way of doing this without separating it all out? Forgive my ignorance, I'm new to python and am still getting familiar with it.

Can't you simply groupby fungicide as well? – Tomasz Kaminski Dec 11 '15 at 14:40 — Tomasz Kaminski, Dec 11 '15 at 14:40

Stefan · Accepted Answer · 2015-12-13T14:12:36.470

11

For a clean solution that properly prints legend and xticks, you could

apple_fplot = df.groupby(['Year','Fungicide'])['Value'].sum()
plot_df = apple_fplot.unstack('Fungicide').loc[:, 'Value']
plot_df.index = pd.PeriodIndex(plot_df.index.tolist(), freq='A')
plot_df.plot()

For subplots, just set the respective keyword to True:

plot_df.plot(subplots=True)

to get:

edited Dec 13 '15 at 14:12

answered Dec 11 '15 at 15:06

Stefan

41,759
13
76
81

Thanks, that works really well. As an aside; can I modify this code to also produce a plot for each line? It occurs to me that I have too many lines to show on one plot without obscuring the data. (not to mention that the figure legend covers half of the plot if I show it). I've tried running the unstacked groupby through a for loop but can't seem to get that working – A. Chatfield Dec 13 '15 at 12:48
Thanks again, but the problem is that there are so many lines I want to plot that when plotting them as subplots on a single plot it becomes vertically squashed to the point of it being totally unreadable. Ideally, I would have each line plotted as a separate plot, and saved to a separate file path. To do this I was trying to do a for loop: `afplot = apple_fplot.unstack('Domain Category') for i, column in afplot: plt.figure(i);afplot[column].plot() plt.savefig('.../apple fplot{}'.format(i))` I'm not sure if that could work, but it gives me: ValueError: too many values to unpack – A. Chatfield Dec 13 '15 at 15:40

score 11 · Answer 2 · answered Dec 11 '15 at 15:09

You can do:

import matplotlib
matplotlib.style.use('ggplot')
import matplotlib.pyplot as plt

plt.figure()
df.groupby(['Year','Fungicide']).sum().unstack().plot()

Data

   Year        State Fungicide  Value
0  2011   California         A  12879
1  2011   California         B  29572
2  2011      Florida         A   8645
3  2011      Florida         B  19573
4  2009   California         A   8764
5  2009   California         B  98643

score 6 · Answer 3 · answered Dec 11 '15 at 15:04

something along the lines of:

df_grouped = df.groupby('Fungicide')
for key, group in df_grouped:
   group.groupby('Year')['Value'].sum().plot(ax=ax,label=key)

By using for loop on a groupby object will iterate through each group, assigning the key (e.g. 'A' or 'B', the values of the column it was grouped by), and the group dataframe each time.

See here for an example

http://pandas.pydata.org/pandas-docs/stable/groupby.html#iterating-through-groups

groupby multiple values, and plotting results

3 Answers3

Linked

Related