0

I defined a weighted COVAR matrix. Now I am trying to roll it over time. That is, I want to obtain a weighted COVAR matrix with a rolling window of 60. As an example, I will take the population covariance matrix:

def cm(data):
    data = data.values
    row_data = data.shape[0]
    col_data = data.shape[1]

    cov_mat = np.zeros([col_data, col_data])

    for i in range(0, col_data):
        for j in range(0, col_data):
            mean_1 = np.mean(data[:,i])
            mean_2 = np.mean(data[:,j])
            total = 0

            for k in range(0, row_data):
               total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)

            cov_mat[i][j] = total * (1/row_data)

    return cov_mat

For this particular scenario, how can I efficiently roll over the matrix?

UPDATE:

After some trial-and-error, I've managed to solve part of my own problem by including a for loop which iterates over the rolling periods:

In:

rolling_window = 60

def cm(data):
     data = data.values    
     row_data = data.shape[0]
     col_data = data.shape[1]

     # Define the number of rolls that have to be made: 
     rolls = row_data - rolling_window

     # Define an empty list which will be filled with COV/VAR matrices:
     cov_mat_main = []

     for t in range(rolls):
         cov_mat = np.zeros([col_data, col_data])

         for i in range(0, col_data):
             for j in range(0, col_data):
                 mean_1 = np.mean(data[t:rolling_window+t,i])
                 mean_2 = np.mean(data[t:rolling_window+t:,j])

                 total = 0
                 for k in range(t, rolling_window+t):
                     total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)

                 cov_mat[i][j] = total * (1/row_data)

         cov_mat_main.append(cov_mat)

     cov_mat_main = np.array(cov_mat_main)

cm(df)

Out:

[[ 5.81310317e-07 -1.37889464e-06 -3.57360335e-07]
  [-1.37889464e-06  8.73264313e-06  6.19930936e-06]
  [-3.57360335e-07  6.19930936e-06  9.02566589e-06]]

 [[ 4.03349133e-07 -1.31881055e-06 -6.03769261e-07]
  [-1.31881055e-06  8.76683970e-06  6.26991034e-06]
  [-6.03769261e-07  6.26991034e-06  8.68739335e-06]]]

However, it seems like the output of this function is not in line with the output of the build-in function.

In:

cm = df.rolling(rolling_window).cov()

Out:

     [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]

There are no missing values in the data frame which could have explained a potential bias in the defined matrices compared to the .cov() matrices.

Hopefully someone can spot the mistake.

Any suggestions?

Max
  • 21
  • 7
  • Why don't you use the things built into NumPy and Pandas? – Nils Werner Mar 03 '19 at 16:31
  • Because I created a specifically weighted COV/VAR matrix for which no 'built-in' function exists. I am just using the defined population covariance matrix as an example. – Max Mar 03 '19 at 17:50

1 Answers1

0

After some trial-and-error, I've managed to solve my own problem.

For anyone interested in the solution:

rolling_window = 30

def cm(data):
    data = data.values
    row_data = data.shape[0]
    col_data = data.shape[1]

    # Specifying the amount of rolls that have to be taken / the amount of VAR/COV matrices that have to be calculated
    rolls = row_data - rolling_window

    # Creating an empty list which will be appened a VAR/COV matrices for every roll. 
    cov_mat_main = []

    for t in range(rolls):
       cov_mat = np.zeros([col_data, col_data])
       begin_est = t+1
       end_est = rolling_window+t+1

           for i in range(0, col_data):
               for j in range(0, col_data):
                   mean_1 = np.mean(data[begin_est:end_est,i])
                   mean_2 = np.mean(data[begin_est:end_est,j])
                   total = 0

                   for k in range(begin_est, end_est):
                       total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
                   cov_mat[i][j] = total * (1/(rolling_window-1))

          cov_mat_main.append(cov_mat)

     cov_mat_main = np.array(cov_mat_main)

     return cov_mat_main

print(cm(df))

It seemed like I had to take into account:

  • The degrees of freedom
  • Division of the 'total' by the rolling_window instead of row_data
  • Adding 1 time-unit to the begin and end of the estimation window

to align it with the .cov() function.

This defined matrix results in, out:

 [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]

Which aligns with df.rolling(rolling_window).cov():

 [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Max
  • 21
  • 7