1

I'm trying to use pyMC to provide a Bayesian estimate of a covariance matrix given some data. I'm roughly following the stock covariance example provided in this online guide (link here), but I have a more simplistic example model that I made up. I've got two values that I draw from a multivariate normal distribution, and I've constructed it in such a way that I know the covariance/correlation between the two variables.

I've posted my short code below. Essentially what I'm doing is constructing an artificial data set where the correlation matrix should be [[1, -0.5], [-0.5, 1]]. At the end of the mcmc sampling, I get a predicted value for the off-diagonal term that is quite a bit different. I've looked at the convergence criteria, and it looks like the autocorrelation is low and the distribution is stationary. However, I will admit I'm still wrapping my head around all the nuances here and there could be aspects of this that are still beyond my grasp.

This question is related to and very much based on these other two SO questions (One and Two). I felt the need to ask my own question despite the similarity because I'm not getting the answer I expect to get. If any of you computational statisticians out there can help provide insight into this problem it would be greatly appreciated!

import numpy as np
import pandas as pd
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as sns

p=2
prior_mu=np.ones(p)
prior_sdev=np.ones(p)
prior_corr_inv=np.eye(p)

def cov2corr(A):
    """
    covariance matrix to correlation matrix.
    """
    d = np.sqrt(A.diagonal())
    A = ((A.T / d).T) / d
    #A[ np.diag_indices(A.shape[0]) ] = np.ones( A.shape[0] )
    return A

# construct artificial data set
muVector=[10,5]
sdevVector=[3.,5.]
corrMatrix=np.matrix([[1,-0.5],[-0.5, 1]])
cov_matrix=np.diag(sdevVector)*corrMatrix*np.diag(sdevVector)
n_obs = 500
x = np.random.multivariate_normal(muVector,cov_matrix,n_obs)

prior_mu = np.array(muVector)
prior_std = np.array(sdevVector)

inv_cov_matrix = pm.Wishart( "inv_cov_matrix", n_obs, np.diag(prior_std**2) )
mu = pm.Normal( "returns", prior_mu, 1, size = 2)

# create the model and sample
obs = pm.MvNormal( "observed returns", mu, inv_cov_matrix, observed = True, value = x )
model = pm.Model( [obs, mu, inv_cov_matrix] )
mcmc = pm.MCMC(model)
mcmc.use_step_method(pm.AdaptiveMetropolis,inv_cov_matrix)
mcmc.sample( 1e5, 2e4, 10)

# Determine prediction - Does not equal corrMatrix!
inv_cov_samples = mcmc.trace("inv_cov_matrix")[:]
mean_covariance_matrix = np.linalg.inv( inv_cov_samples.mean(axis=0) )
prediction = cov2corr(mean_covariance_matrix*n_obs)
hobscrk777
  • 2,347
  • 4
  • 23
  • 29

0 Answers0