13

I'm using some data on fungicide usage which has the Year, Fungicide, Amount used, along with some irrelevant columns in a panda DataFrame. It looks somewhat like:

Year, State,      Fungicide, Value
2011, California, A,         12879
2011, California, B,         29572
2011, Florida,    A,         8645
2011, Florida,    B,         19573
2009, California, A,         8764
2009, California, B,         98643,
...

What I want from it is a single plot of total fungicide used over time, with a line plotted for each individual fungicide (in a different colour). I've used .groupby to get the total amount of each fungicide used each year:

apple_fplot = df.groupby(['Year','Fungicide'])['Value'].sum()

This gives me the values I want to plot, something like:

Year, Fungicide, Value
...
2009, A,        128635
      B,        104765
2011, A,        154829
      B,        129865

Now I need to plot it so that each fungicide (A, B, ...) is a separate line on a single plot of Value over Time

Is there a way of doing this without separating it all out? Forgive my ignorance, I'm new to python and am still getting familiar with it.

A. Chatfield
  • 145
  • 1
  • 1
  • 7

3 Answers3

11

For a clean solution that properly prints legend and xticks, you could

apple_fplot = df.groupby(['Year','Fungicide'])['Value'].sum()
plot_df = apple_fplot.unstack('Fungicide').loc[:, 'Value']
plot_df.index = pd.PeriodIndex(plot_df.index.tolist(), freq='A')
plot_df.plot()

enter image description here For subplots, just set the respective keyword to True:

plot_df.plot(subplots=True)

to get:

enter image description here

Stefan
  • 41,759
  • 13
  • 76
  • 81
  • Thanks, that works really well. As an aside; can I modify this code to also produce a plot for each line? It occurs to me that I have too many lines to show on one plot without obscuring the data. (not to mention that the figure legend covers half of the plot if I show it). I've tried running the unstacked groupby through a for loop but can't seem to get that working – A. Chatfield Dec 13 '15 at 12:48
  • Thanks again, but the problem is that there are so many lines I want to plot that when plotting them as subplots on a single plot it becomes vertically squashed to the point of it being totally unreadable. Ideally, I would have each line plotted as a separate plot, and saved to a separate file path. To do this I was trying to do a for loop: `afplot = apple_fplot.unstack('Domain Category') for i, column in afplot: plt.figure(i);afplot[column].plot() plt.savefig('.../apple fplot{}'.format(i))` I'm not sure if that could work, but it gives me: ValueError: too many values to unpack – A. Chatfield Dec 13 '15 at 15:40
11

You can do:

import matplotlib
matplotlib.style.use('ggplot')
import matplotlib.pyplot as plt

plt.figure()
df.groupby(['Year','Fungicide']).sum().unstack().plot()

enter image description here

Data

   Year        State Fungicide  Value
0  2011   California         A  12879
1  2011   California         B  29572
2  2011      Florida         A   8645
3  2011      Florida         B  19573
4  2009   California         A   8764
5  2009   California         B  98643
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
6

something along the lines of:

df_grouped = df.groupby('Fungicide')
for key, group in df_grouped:
   group.groupby('Year')['Value'].sum().plot(ax=ax,label=key)

By using for loop on a groupby object will iterate through each group, assigning the key (e.g. 'A' or 'B', the values of the column it was grouped by), and the group dataframe each time.

See here for an example

http://pandas.pydata.org/pandas-docs/stable/groupby.html#iterating-through-groups

Chris
  • 957
  • 5
  • 10