5

Using numpy I can simulate unconditionally from a multivariate normal distribution by

mean = [0, 0]
cov = [[1, 0], [0, 100]]  # diagonal covariance
x, y = np.random.multivariate_normal(mean, cov, 5000).T

How do I simulate y from the same distribution, given that I have 5000 realizations of x? I'm looking for a generalized solution that can be scaled to an arbitrary dimension.

dms_quant
  • 85
  • 1
  • 4
  • 10
  • 1
    If these are independent (covariance suggests so) Couldn't you just generate from a univariate normal distribution (mean=0, variance=100)? – ayhan Aug 02 '16 at 08:22
  • Indeed. This is just a poorly written example - I would like to be able to handle the case of a full rank covariance matrix – dms_quant Aug 02 '16 at 08:43

2 Answers2

8

Looking up in Eaton, Morris L. (1983). Multivariate Statistics: a Vector Space Approach, I gathered following example solution for a 4 vaiable system, with 2 dependent variables (the first two) and 2 independent variables (the last two)

import numpy as np

mean = np.array([1, 2, 3, 4])
cov = np.array(
    [[ 1.0,  0.5,  0.3, -0.1], 
     [ 0.5,  1.0,  0.1, -0.2], 
     [ 0.3,  0.1,  1.0, -0.3], 
     [-0.1, -0.2, -0.3,  0.1]])  # diagonal covariance

c11 = cov[0:2, 0:2] # Covariance matrix of the dependent variables
c12 = cov[0:2, 2:4] # Custom array only containing covariances, not variances
c21 = cov[2:4, 0:2] # Same as above
c22 = cov[2:4, 2:4] # Covariance matrix of independent variables

m1 = mean[0:2].T # Mu of dependent variables
m2 = mean[2:4].T # Mu of independent variables

conditional_data = np.random.multivariate_normal(m2, c22, 1000)

conditional_mu = m2 + c12.dot(np.linalg.inv(c22)).dot((conditional_data - m2).T).T
conditional_cov = np.linalg.inv(np.linalg.inv(cov)[0:2, 0:2])

dependent_data = np.array([np.random.multivariate_normal(c_mu, conditional_cov, 1)[0] for c_mu in conditional_mu])

print np.cov(dependent_data.T, conditional_data.T)
>> [[ 1.0012233   0.49592165  0.28053086 -0.08822537]
    [ 0.49592165  0.98853341  0.11168755 -0.22584691]
    [ 0.28053086  0.11168755  0.91688239 -0.27867207]
    [-0.08822537 -0.22584691 -0.27867207  0.94908911]]

which is acceptably close to the pre-defined covariance matrix. The solution is also briefly described on Wikipedia

dms_quant
  • 85
  • 1
  • 4
  • 10
1

In order to generalize the answer by @dms_quant to an arbitrary number of dimensions and conditional distributions we can add a partition parameter k, which split the covariance matrix into a marginal distributions of z1 and z2. The example below computes the conditional distribution of z1 given z2.

Modified code for an arbitrarily sized cov matrix and conditional value/s of 0:

conditional_values = (len(cov)-k)*[0]
c11 = cov[0:k, 0:k]
c12 = cov[0:k, k:len(cov)]
c21 = cov[k:len(cov), 0:k]
c22 = cov[k:len(cov), k:len(cov)]

m1 = mean[0:k].T
m2 = mean[k:len(cov)].T

conditional_mu = m1 + c12.dot(np.linalg.inv(c22)).dot((conditional_values - m2).T).T
conditional_cov = np.linalg.inv(np.linalg.inv(cov)[0:k, 0:k])
danospanos
  • 41
  • 3