Pandas -- Plotting cumulative days vs volume

Question

I'm trying to plot volume/day vs days using pandas but my solution doesn't output more than the first value(CumTime[0],R[0]). Below is an example of how the initial table looks like and accompanying it are the output / result I was looking to get. Any suggestion / help will be appreciated. Thanks

Table:

<table style="width:50%">
  <tr>
    <th>ID</th>
    <th>Date</th> 
    <th>Days</th>
    <th>Volume/Day</th>
  </tr>
    <tr>
    <td>a2</td>
    <td>01/01/2014</td>
    <td>20</td>
    <td>60</td>
  </tr>
  <tr>
    <td>a1</td>
    <td>01/01/2014</td>
    <td>15</td>
    <td>100</td>
  </tr>
  <tr>
    <td>a1</td>
    <td>02/01/2014</td>
    <td>30</td>
    <td>80</td>
  </tr>
  <tr>
    <td>a2</td>
    <td>02/01/2014</td>
    <td>20</td>
    <td>40</td>
  </tr>
</table>

Tried solution:

df_grp=df.groupby('ID')
for key, grp in df_grp:
def final_result(all_data):
    for key, grp in all_data:
        grp.set_index('Date',inplace=True)
        CumTime = grp['Days'].cumsum()
        R = grp['Volume/Day']
    return CumTime,R  

CumTime,R = final_result(df_grp)

Expected Result:

<table style="width:50%">
  <tr>
    <th>ID</th>
    <th>Date</th>
    <th>Days(***Cumulative_days)</th>
    <th>Volume/Day</th>
  </tr>
  <tr>
    <td>a1</td>
    <td>01/01/2014</td>
    <td>15</td>
    <td>100</td>
  </tr>
  <tr>
    <td>a1</td>
    <td>02/01/2014</td>
    <td>45</td>
    <td>80</td>
  </tr>
 <tr>
    <th>ID</th>
    <th>Date</th>
    <th>Days(***Cumulative_days)</th>
    <th>Volume/Day</th>
  </tr>
  <tr>
    <td>a2</td>
    <td>01/01/2014</td>
    <td>65</td>
    <td>60</td>
  </tr>
  <tr>
    <td>a2</td>
    <td>02/01/2014</td>
    <td>85</td>
    <td>40</td>
  </tr>
</table>

The data you provided has unique IDs. What will a groupby be of use here for? — cs95, Sep 08 '17 at 23:54
@COLDSPEED, just updated the table. Each ID has it's values on the same month to month time frame. Does that help answer your question? — dlvr, Sep 09 '17 at 03:44
@COLDSPEED, you understood me right based on how I had the table set up earlier. Sorry, I keep changing how the result should look like. I wanted your solution below to be grouped by the ID and indexed by the date. I just updated how I would like the table to look like above. Can you help out with that? Thanks — dlvr, Sep 09 '17 at 18:34
So it was right, but you want to separate the result based on the id? — cs95, Sep 09 '17 at 18:35
Yes, you were right and I want to separate the result based on the ID and indexed by the date — dlvr, Sep 09 '17 at 18:38

cs95 · Accepted Answer · 2017-09-09T19:57:22.850

0

IIUC, you can sort by ID and Days and then find the cumsum.

df = df.sort_values(['ID', 'Days'])
df.Days = df.Days.cumsum()
df

   ID        Date  Days  Volume/Day
1  a1  01/01/2014    15         100
2  a1  02/01/2014    45          80
0  a2  01/01/2014    65          60
3  a2  02/01/2014    85          40

To just display, you can call df.groupby and print each group separately.

for k, g in df.groupby("ID"):
     print(g.drop('ID', 1).set_index("Date"))

            Days  Volume/Day
Date                        
01/01/2014    15         100
02/01/2014    45          80

            Days  Volume/Day
Date                        
01/01/2014    65          60
02/01/2014    85          40

To plot, use plt.subplots:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,6))
for k, g in df.groupby("ID"):
    g.plot(x='Days', y='Volume/Day', ax=ax, label=k)

ax.set_xlabel('Days')
ax.set_ylabel('Volume/Day')

plt.legend(loc='best')
plt.show()

edited Sep 09 '17 at 19:57

answered Sep 09 '17 at 03:48

cs95

379,657
97
704
746

@COLDSPEED, thanks for your help so far. This is close to the solution I wanted but I didn't capture the result I wanted well enough. I wanted the solution you put up to be grouped by the ID and indexed by the date. I just updated how I would like the table to look like. Can you help with that? Thanks – dlvr Sep 09 '17 at 18:27
@dlvr Does this answer your question sufficiently? – cs95 Sep 09 '17 at 19:57
@COLDSPEED, it doesn't sufficiently. Although, this works for this sample data set of few points but, when I run it on the main data set, it produces no plot or error. All I see is the asterix on the side(*) on my jupyter notebook. Any idea what may cause this? Thanks – dlvr Sep 09 '17 at 23:14
@dlvr asterisk means you have a lot of data which is taking a long time to process. – cs95 Sep 11 '17 at 21:56
@dlvr If my answer was not helpful, I'm going to delete it. – cs95 Sep 12 '17 at 02:01
@COLDSPEED, it was more than helpful and I appreciate. Still getting the hang of things using S.O more effectively. Thanks again – dlvr Sep 12 '17 at 07:49
@COLDSPEED, I posted this question (https://stackoverflow.com/questions/46230399/pandas-getting-column-values-from-groupby-operation) which is somewhat related to this question you answered here. Please can you take a look at it. Thanks – dlvr Sep 15 '17 at 01:38

Pandas -- Plotting cumulative days vs volume

1 Answers1