0

I have a DatetimeIndex indexed dataframe with two columns. The index is uneven.

          A        B
Date
2016-01-04  1   20
2016-01-12  2   10
2016-01-21  3   10
2016-01-25  2   20
2016-02-08  2   30
2016-02-15  1   20
2016-02-21  3   20
2016-02-25  2   20

I want to compute the dot product of time-series A and B over a rolling window of length 20 days.

It should return:

            dot
Date
2016-01-04  Nan   
2016-01-12  Nan   
2016-01-21  Nan   
2016-01-25  110    
2016-02-08  130 
2016-02-15  80    
2016-02-21  140 
2016-02-25  180 

here is how this is obtained:

110 = 2*10+3*10+2*20 (product obtained in period from 2016-01-06 to 2016-01-25 included)

130 = 3*10+2*20+2*30 (product obtained in period from 2016-01-20 to 2016-02-08)

80 = 1*20+2*30 (product obtained in period from 2016-01-27 to 2016-02-15)

140 = 3*20+1*20+2*30 (product obtained in period from 2016-02-02 to 2016-02-21)

180 = 2*20+3*20+1*20+2*30 (product obtained in period from 2016-02-06 to 2016-02-25)

The dot product is an example that should be generalizable to any function taking two series and returning a value.

cs95
  • 379,657
  • 97
  • 704
  • 746
Laura
  • 360
  • 2
  • 15

1 Answers1

0

I think this should work. df.product() across rows, the df.rolling(period).sum()

Dates = pd.to_datetime(['2016-01-04',  
                    '2016-01-12',  
                    '2016-01-21',  
                    '2016-01-25',  
                    '2016-02-08',  
                    '2016-02-15', 
                    '2016-02-21',
                    '2016-02-25',
                    '2016-02-26'
                   ]
                  )

data = {'A': [i*10 for i in range(1,10)], 'B': [i for i in range(1,10)]}
df1 = pd.DataFrame(data = data, index = Dates)
df2 = df1.product(axis =1).rolling(3).sum()
df2.columns = 'Dot'
df2

output


2016-01-04       NaN
2016-01-12       NaN
2016-01-21     140.0
2016-01-25     290.0
2016-02-08     500.0
2016-02-15     770.0
2016-02-21    1100.0
2016-02-25    1490.0
2016-02-26    1940.0
dtype: float64

And if your data is daily and you want to get 20 days data first, group them by 20 days and sum them up, or use last, according to what you want.

Dates1 = pd.date_range(start='2016-03-31', end = '2016-07-31')
data1 = {'A': [np.pi * i * np.random.rand() 
               for i in range(1, len(Dates1) + 1)], 
         'B': [i * np.random.randn() * 10 
               for i in range(1, len(Dates1) + 1)]}
df3 = pd.DataFrame(data = data1, index = Dates1)
df3.groupby(pd.TimeGrouper(freq = '20d')).sum()

                     A             B
2016-03-31   274.224084   660.144639
2016-04-20  1000.456615 -2403.034012
2016-05-10  1872.422495 -1737.571080
2016-05-30  2121.497529  1157.710510
2016-06-19  3084.569208 -1854.258668
2016-07-09  3324.775922 -9743.113805
2016-07-29   505.162678 -1179.730820

and then use dot product like I did above.

rko
  • 194
  • 2
  • 10