2

I am trying to fill a dataframe (elist) with cumulative returns of firms and cumulative market returns. This can be done by looping through the elist dataframe using iterrows, see this link. However, this is slow.

I am looking for a more potent, faster solution.

The raw returns that are the inputs for the cumulative return calculations originate from two dataframes (ri, rm). The results should then be recorded in columns of elist. See example below, using the data in this file.

Before running the iterrows loop, elist looks like:

    permno  begdat      enddat   return vwretd
0   11628   2012-03-31  2013-03-31  NaN NaN
1   11628   2012-06-30  2013-06-30  NaN NaN
2   11628   2012-09-30  2013-09-30  NaN NaN
3   11628   2012-12-31  2013-12-31  NaN NaN
4   11628   2013-03-31  2014-03-31  NaN NaN

After running the loop elist should look like:

    permno   begdat     enddat      return      vwretd
0   11628   2012-03-31  2013-03-31  0.212355    0.133429
1   11628   2012-06-30  2013-06-30  0.274788    0.198380
2   11628   2012-09-30  2013-09-30  0.243590    0.198079
3   11628   2012-12-31  2013-12-31  0.299277    0.304479
4   11628   2013-03-31  2014-03-31  0.303147    0.208454

Here is the code that relies on iterrows, which is slow:

import os,sys
import pandas as pd
import numpy as np
rm     = pd.read_csv('rm_so.csv') # market return
ri     = pd.read_csv('ri_so.csv') # firm return
elist  = pd.read_csv('elist_so.csv') # table to be filled with cumlative returns over a period (begdat to enddat)
for index, row in elist.iterrows():
    #fill cumulative market return
    elist.loc[index, 'vwretd']=rm.loc[(rm['date']>row['begdat']) & (rm['date']<=row['enddat']),'vwretd'].product()-1    
    #fill cumulative firm return
    r = ri.loc[(ri['permno']==row['permno']),]
    elist.loc[index, 'return'] = r.loc[(r['date']>row['begdat']) & (r['date']<=row['enddat']),'ret'].product()-1

It would be great to see this process running faster!

Community
  • 1
  • 1
Martien Lubberink
  • 2,614
  • 1
  • 19
  • 31

0 Answers0