Pandas: Cumulative return function

Question

I have a dataframe such as the following:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500

What's the best way to calculate a cumulative return across all columns on the last row?

Following is the intended result:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500
Cumulative   0.289316

Where cumulative return calculated as follows:

cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1

What is the best way to perform this in pandas?

Steven G · Accepted Answer · 2016-11-25T20:24:20.003

there is a pandas cumprod() method for that. this will work for every columns.

df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]

this would be about 2 time faster than other solutions on large dataset:

In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop

I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.

that was brilliantly faster!! but i have one column that includes NaN. Any workaround to skip those and not return NaN? — Kelaref, Nov 25 '16 at 21:01
`df.ix["Cumulative"] = ((df.fillna(0)+1).cumprod()-1).iloc[-1]` would replace NaN with 0 return. — Steven G, Nov 25 '16 at 21:03

TheF1rstPancake · Answer 2 · 2016-11-25T20:01:06.297

4

Another solution:

df.ix["Cumulative"] = (df['Return']+1).prod() - 1

This will add 1 to the df['Return'] column, multiply all the rows together, and then subtract one from the result. This will result in a simple float value. The result will then be placed at the index "Cumulative". Since that index doesn't exist yet, it will be appended to the end of the DataFrame:

               Return
2008-11-21   0.153419
2008-11-25   0.077500
2008-11-24   0.037421
Cummulative  0.289316

If you want to apply this across multiple columns:

df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)

This would output the following (I made a second column called "Return2" that is a copy of "Return"):

               Return   Return2
2008-11-21   0.153419  0.153419
2008-11-25   0.077500  0.077500
2008-11-24   0.037421  0.037421
Cummulative  0.289316  0.289316

edited Nov 25 '16 at 20:01

answered Nov 25 '16 at 19:42

TheF1rstPancake

2,318
17
17

Thx @Jalepeno112, but how to apply across all columns? – Kelaref Nov 25 '16 at 19:56
1

Updated my answer. Missed that part of your question. It's very similar to @Psidom's answer but I think this one is more readable. – TheF1rstPancake Nov 25 '16 at 20:05
1

No problem. You should take a look at @StevenG's answer though. – TheF1rstPancake Nov 25 '16 at 20:19

Psidom · Answer 3 · 2016-11-25T19:40:35.200

2

With pandas, you can use the prod() method:

df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)

#        Index    Return
#0  2008-11-21  0.153419
#1  2008-11-24  0.037421
#2  2008-11-25  0.077500
#3         NaN  0.289316

Or as @Randy C commented, this can be further simplified to:

df.append((df.iloc[:,1:] + 1).prod() - 1, ignore_index=True)

edited Nov 25 '16 at 19:40

answered Nov 25 '16 at 19:33

Psidom

209,562
33
339
356

1

Nice one. For just doing the calculation, it can be simplified a little to `(df['Return']+1).prod() - 1`. – Randy Nov 25 '16 at 19:37
@RandyC Yeah, that is a more succinct way for this problem. – Psidom Nov 25 '16 at 19:41
Thx @Psidom , i did df = the above, it works great, except that my date index disappeared, and the cum return under the first column returns a NaN, despite not including any NaNs and the cumulative is not equal to zero. Any ideas why? – Kelaref Nov 25 '16 at 19:54

score 1 · Answer 4 · answered Nov 25 '16 at 19:29

One option is to just use reduce, though others might be able to come up with faster vectorized methods:

In [10]: pd.read_clipboard()
Out[10]:
        Index    Return
0  2008-11-21  0.153419
1  2008-11-24  0.037421
2  2008-11-25  0.077500

In [11]: reduce(lambda x, y: (1+x)*(1+y)-1, _10['Return'])
Out[11]: 0.28931612705992227

Note that in Python 3, reduce is part of the functools library, though it's a builtin for Python 2.

score 1 · Answer 5 · answered Nov 25 '16 at 19:37

1

Here is mine:

from numpy import prod
df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)

answered Nov 25 '16 at 19:37

Alex

12,078
6
64
74

Sure, first `import numpy as np` and then: `df.append(df.apply(lambda col: prod([(1+c) for c in col if not np.isnan(c)]) - 1), ignore_index=True)`. If you try it out then let me know if it's not working! – Alex Nov 25 '16 at 20:20

Pandas: Cumulative return function

5 Answers5