3

Hi I'm trying to calculate regression betas for an expanding window in pandas. I have the following function to calculate beta

  def beta(row, col1, col2):
      return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])

And I've tried the following to get the expanding beta on my dataframe df

pandas.expanding_apply(df, beta, col1='col1', col2='col2')
pandas.expanding_apply(df, beta, kwargs={'col1':'col1', 'col2':'col2'})
df.expanding.apply(...)

However none of them work, I either get something that says the kwargs aren't getting passed through or if I hardcode the column names in the beta function I get

*** IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Thanks

Example:

def beta(row, col1, col2):
    return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
df = pandas.DataFrame({'a':[1,2,3,4,5],'b':[.1,5,.3,.5,6]})
pandas.expanding_apply(compute_df, beta, col1='a', col2='b')
pandas.expanding_apply(compute_df, beta, kwargs={'col1':'a', 'col2':'b'})

Both of those return errors

JohnE
  • 29,156
  • 8
  • 79
  • 109
Michael
  • 7,087
  • 21
  • 52
  • 81
  • Could you give us a minimum working example? Then we could maybe check some assumptions instead of guessing. – CodeMonkey Jul 06 '17 at 12:23
  • sure, see above – Michael Jul 06 '17 at 12:30
  • If you instrument your beta function you will see that the expanding_apply doesn't give you a row. Rather it seems to only give you progressive values in the b and then the a column. [ 0.1 5. ] [ 0.1 5. 0.3] [ 0.1 5. 0.3 0.5] [ 0.1 5. 0.3 0.5 6. ] [ 1. 2.] [ 1. 2. 3.] [ 1. 2. 3. 4.] [ 1. 2. 3. 4. 5.]I couldn't really find much info on this function in the docs. – CodeMonkey Jul 06 '17 at 12:46
  • A couple of general remarks: `expanding_apply` is deprecated in favor of `expanding().apply()` and I added the statsmodel tag as pandas mostly farms out regression type stuff to statsmodels (or perhaps scipy or sklearn) – JohnE Jul 06 '17 at 14:21

1 Answers1

2

I've run into this issue when trying to calculate betas for rolling multiple regression, very similar to what you're doing (see here). The key issue is that with Expanding.apply(func, args=(), kwargs={}), the func param

Must produce a single value from an ndarray input *args and **kwargs are passed to the function

[source]

And there is really no way to accomodate using expanding.apply. (Note: as mentioned, expanding_apply is deprecated.)

Below is a workaround. It's more computationally expensive (will eat up memory) but will get you to your output. It creates a list of expanding-window NumPy arrays and then calculates a beta over each.

from pandas_datareader.data import DataReader as dr
import numpy as np
import pandas as pd

df = (dr(['GOOG', 'SPY'], 'google')['Close']
      .pct_change()
      .dropna())

# i is the asset, m is market/index
# [0, 1] grabs cov_i,j from the covar. matrix
def beta(i, m):
    return np.cov(i, m)[0, 1] / np.var(m)

def expwins(x, min_periods):
    return [x[:i] for i in range(min_periods, x.shape[0] + 1)]

# Example:
# arr = np.arange(10).reshape(5, 2)
# print(expwins(arr, min_periods=3)[1]) # the 2nd window of the set
# array([[0, 1],
       # [2, 3],
       # [4, 5],
       # [6, 7]])

min_periods = 21
# Create "blocks" of expanding windows
wins = expwins(df.values, min_periods=min_periods)
# Calculate a beta (single scalar val.) for each
betas = [beta(win[:, 0], win[:, 1]) for win in wins]
betas = pd.Series(betas, index=df.index[min_periods - 1:])

print(betas)
Date
2010-02-03    0.77572
2010-02-04    0.74769
2010-02-05    0.76692
2010-02-08    0.74301
2010-02-09    0.74741
2010-02-10    0.74635
2010-02-11    0.74735
2010-02-12    0.74605
2010-02-16    0.78521
2010-02-17    0.77619
2010-02-18    0.79188
2010-02-19    0.78952

2017-06-19    0.97387
2017-06-20    0.97390
2017-06-21    0.97386
2017-06-22    0.97387
2017-06-23    0.97391
2017-06-26    0.97389
2017-06-27    0.97482
2017-06-28    0.97508
2017-06-29    0.97594
2017-06-30    0.97584
2017-07-03    0.97575
2017-07-05    0.97588
dtype: float64
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235