12

I have a dataframe such as the following:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500

What's the best way to calculate a cumulative return across all columns on the last row?

Following is the intended result:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500
Cumulative   0.289316

Where cumulative return calculated as follows:

cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1 

What is the best way to perform this in pandas?

ettanany
  • 19,038
  • 9
  • 47
  • 63
Kelaref
  • 547
  • 1
  • 8
  • 26

5 Answers5

14

there is a pandas cumprod() method for that. this will work for every columns.

df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]

this would be about 2 time faster than other solutions on large dataset:

In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop

I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.

Steven G
  • 16,244
  • 8
  • 53
  • 77
  • that was brilliantly faster!! but i have one column that includes NaN. Any workaround to skip those and not return NaN? – Kelaref Nov 25 '16 at 21:01
  • 2
    `df.ix["Cumulative"] = ((df.fillna(0)+1).cumprod()-1).iloc[-1]` would replace NaN with 0 return. – Steven G Nov 25 '16 at 21:03
4

Another solution:

df.ix["Cumulative"] = (df['Return']+1).prod() - 1

This will add 1 to the df['Return'] column, multiply all the rows together, and then subtract one from the result. This will result in a simple float value. The result will then be placed at the index "Cumulative". Since that index doesn't exist yet, it will be appended to the end of the DataFrame:

               Return
2008-11-21   0.153419
2008-11-25   0.077500
2008-11-24   0.037421
Cummulative  0.289316

If you want to apply this across multiple columns:

df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)

This would output the following (I made a second column called "Return2" that is a copy of "Return"):

               Return   Return2
2008-11-21   0.153419  0.153419
2008-11-25   0.077500  0.077500
2008-11-24   0.037421  0.037421
Cummulative  0.289316  0.289316
TheF1rstPancake
  • 2,318
  • 17
  • 17
2

With pandas, you can use the prod() method:

df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)

#        Index    Return
#0  2008-11-21  0.153419
#1  2008-11-24  0.037421
#2  2008-11-25  0.077500
#3         NaN  0.289316

Or as @Randy C commented, this can be further simplified to:

df.append((df.iloc[:,1:] + 1).prod() - 1, ignore_index=True)
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    Nice one. For just doing the calculation, it can be simplified a little to `(df['Return']+1).prod() - 1`. – Randy Nov 25 '16 at 19:37
  • @RandyC Yeah, that is a more succinct way for this problem. – Psidom Nov 25 '16 at 19:41
  • Thx @Psidom , i did df = the above, it works great, except that my date index disappeared, and the cum return under the first column returns a NaN, despite not including any NaNs and the cumulative is not equal to zero. Any ideas why? – Kelaref Nov 25 '16 at 19:54
1

One option is to just use reduce, though others might be able to come up with faster vectorized methods:

In [10]: pd.read_clipboard()
Out[10]:
        Index    Return
0  2008-11-21  0.153419
1  2008-11-24  0.037421
2  2008-11-25  0.077500

In [11]: reduce(lambda x, y: (1+x)*(1+y)-1, _10['Return'])
Out[11]: 0.28931612705992227

Note that in Python 3, reduce is part of the functools library, though it's a builtin for Python 2.

Randy
  • 14,349
  • 2
  • 36
  • 42
1

Here is mine:

from numpy import prod
df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
Alex
  • 12,078
  • 6
  • 64
  • 74
  • Sure, first `import numpy as np` and then: `df.append(df.apply(lambda col: prod([(1+c) for c in col if not np.isnan(c)]) - 1), ignore_index=True)`. If you try it out then let me know if it's not working! – Alex Nov 25 '16 at 20:20