R: how to compute the mean and covariance of a truncated normal distribution

Question

I'm interested in finding the mean and covariance of a truncated normal random vector. Suppose Y is a vector containing [Y1 Y2 Y3]. Y follows a multivariate normal distribution with the following mean and covariance:

mu <- c(0.5, 0.5, 0.5)
sigma <- matrix(c(  1,  0.6, 0.3,
                    0.6,    1, 0.2,
                    0.3,  0.2,   2), 3, 3)

The truncation region is the set of Ys such that AY >= 0. For instance,

A <- matrix(c(1, -2, -0.5, 1.5, -2, 0, 3, -1, -1, 4, 0, -2), byrow = TRUE, nrow = 4)
> A
     [,1] [,2] [,3]
[1,]  1.0   -2 -0.5
[2,]  1.5   -2  0.0
[3,]  3.0   -1 -1.0
[4,]  4.0    0 -2.0

For the following draw of Y, it does not satisfy AY >= 0:

set.seed(3)
Y <- rmvnorm(n = 1, mean = mu, sigma = sigma)
> all(A %*% as.matrix(t(Y)) >= 0)
[1] FALSE

But for other draws of Y, they will satisfy AY >= 0, and I want to find the mean and covariance of those Ys that satisfy AY >= 0.

There are existing packages in R that compute the mean and covariance of a truncated normal distribution. For example, mtmvnorm from the tmvtnorm package:

library(tmvtnorm)
mtmvnorm(mu, sigma, lower = ???, upper = ???)

However, the truncation set that I have, i.e, set of Ys that satisfy AY >= 0, cannot be described by just lower and upper bounds. Is there another way to R to compute the mean and covariance of a truncated normal?

Perhaps you can ask this question on [CrossValidated](https://stats.stackexchange.com/)? — Martin Gal, Jun 05 '21 at 21:31
@MartinGal I did at first. But people voted to close the question.. — Adrian, Jun 06 '21 at 00:43

ThomasIsCoding · Accepted Answer · 2021-06-06T21:46:25.233

You had correct understanding (or maybe noticed) that this is NOT truncated multivariate normal distribution. You have AY>=0 as a linear constraint over Y, rather than simple element-wise lower/upper bounds.

If you are not a math guy, i.e., pursuing explicit solutions of mean and covariance, I guess a straightforward and efficient way is using Monte Carlo simulation.

More specifically, you can presume a sufficient large N to generate big enough set of samples Y and then filter out the samples that satisfy the constraint AY>=0. In turn, you can compute mean and covariance over the selected samples. An attempt is given as below

N <- 1e7
Y <- rmvnorm(n = N, mean = mu, sigma = sigma)
Y_h <- subset(Y, colSums(tcrossprod(A, Y) >= 0) == nrow(A))
mu_h <- colMeans(Y_h)
sigma_h <- cov(Y_h)

and you will see

> mu_h
[1]  0.8614791 -0.1365222 -0.3456582
> sigma_h
          [,1]       [,2]       [,3]
[1,] 0.5669915 0.29392671 0.37487421
[2,] 0.2939267 0.36318397 0.07193513
[3,] 0.3748742 0.07193513 1.37194669

Another way follows the similar idea, but we can presume the set size of selected samples, i.e., N samples Y all should make AY>=0 stand. Then we can use while loop to do this

N <- 1e6
Y_h <- list()
nl <- 0
while (nl < N) {
  Y <- rmvnorm(n = N, mean = mu, sigma = sigma)
  v <- subset(Y, colSums(tcrossprod(A, Y) >= 0) == nrow(A))
  nl <- nl + nrow(v)
  Y_h[[length(Y_h) + 1]] <- v
}
Y_h <- head(do.call(rbind, Y_h), N)
mu_h <- colMeans(Y_h)
sigma_h <- cov(Y_h)

and you will see

> mu_h
[1]  0.8604944 -0.1364895 -0.3463887
> sigma_h
          [,1]       [,2]       [,3]
[1,] 0.5683498 0.29492573 0.37524248
[2,] 0.2949257 0.36352022 0.07252898
[3,] 0.3752425 0.07252898 1.37427521

Note: The advantage of the second option is that, it gives you the sufficient large number of selected Y_h as you want.

thanks. Is there any way to speed up the second solution? I posted the question on parallelizing the while loop (it was really getting at trying to speed up the second solution). I've tried the solution with `future` but am wondering if there's any other way to speed it up. — Adrian, Jun 06 '21 at 21:37
@Adrian I saw your question. I don't think `future` could speed up your code. You can try my updated answer https://stackoverflow.com/a/67863714/12158757 — ThomasIsCoding, Jun 06 '21 at 21:40
@Adrian I updated the answer to this question as well, so it not slow any more. — ThomasIsCoding, Jun 06 '21 at 21:46

R: how to compute the mean and covariance of a truncated normal distribution

1 Answers1