efficient computation of Trace(AB^{-1}) given A and B

Question

I have two square matrices A and B. A is symmetric, B is symmetric positive definite. I would like to compute $trace(A.B^{-1})$. For now, I compute the Cholesky decomposition of B, solve for C in the equation $A=C.B$ and sum up the diagonal elements.

Is there a more efficient way of proceeding?

I plan on using Eigen. Could you provide an implementation if the matrices are sparse (A can often be diagonal, B is often band-diagonal)?

I think the C++ tag actually does belong here, since the question is about an implementation using Eigen, a C++ matrix manipulation library. — David Z, Sep 22 '11 at 01:28
@FooBah A is a sample variance covariance matrix. I'd be tempted to say that it's spd but I'm not sure. — yannick, Sep 22 '11 at 08:06

Kipton Barros · Accepted Answer · 2011-09-22T22:09:25.303

5

If B is sparse, it may be efficient (i.e., O(n), assuming good condition number of B) to solve for x_i in

B x_i = a_i

(sample Conjugate Gradient code is given on Wikipedia). Taking a_i to be the column vectors of A, you get the matrix B^{-1} A in O(n^2). Then you can sum the diagonal elements to get the trace. Generally, it's easier to do this sparse inverse multiplication than to get the full set of eigenvalues. ~~For comparison, Cholesky decomposition is O(n^3).~~ (see Darren Engwirda's comment below about Cholesky).

If you only need an approximation to the trace, you can actually reduce the cost to O(q n) by averaging

r^T (A B^{-1}) r

over q random vectors r. Usually q << n. This is an unbiased estimate provided that the components of the random vector r satisfy

< r_i r_j > = \delta_{ij}

where < ... > indicates an average over the distribution of r. For example, components r_i could be independent gaussian distributed with unit variance. Or they could be selected uniformly from +-1. Typically the trace scales like O(n) and the error in the trace estimate scales like O(sqrt(n/q)), so the relative error scales as O(sqrt(1/nq)).

edited Sep 22 '11 at 22:09

answered Sep 22 '11 at 04:27

Kipton Barros

21,002
4
67
80

Thanks for your answer. How do you do the averaging with r ? from what you write, it seems like you need to compute A.B^{-1} which is probably not what you wanted to say. – yannick Sep 22 '11 at 08:10
Kipton probably means you should compute r^T A B^{-1} r by first solving B x = r and then computing r^T A x. But I do not see how he gets a cost of O(n) for the probabilistic approach: solving n systems with cost O(n) each gives a cost of O(n^2). Perhaps the number of random vectors can be taken smaller than n = size of A? – Jitse Niesen Sep 22 '11 at 08:38
1

@Kipton Barros: The complexity for a *sparse* Cholesky factorisation is definitely not `O(N^3)` - that would be a dense factorisation, the sparse case is typically much, much faster. Using a `PCG` solver to solve `N` linear systems would result in *at least* `O(N|B|)` complexity, which would only be `O(N^2)` in the case of diagonal `B`. I would probably benchmark against a "good" sparse factorisation package, perhaps `CHOLMOD`. – Darren Engwirda Sep 22 '11 at 21:57
@Darren, Thanks for the correction about Cholesky decomposition. – Kipton Barros Sep 22 '11 at 22:09

score 1 · Answer 2 · answered Sep 22 '11 at 01:39

1

If generalized eigenvalues are more efficient to compute, you can compute the generalized eigenvalues, A*v = lambda* B *v and then sum up all the lambdas.

answered Sep 22 '11 at 01:39

Chris A.

6,817
2
25
43

efficient computation of Trace(AB^{-1}) given A and B

2 Answers2