7

I have two square matrices A and B. A is symmetric, B is symmetric positive definite. I would like to compute $trace(A.B^{-1})$. For now, I compute the Cholesky decomposition of B, solve for C in the equation $A=C.B$ and sum up the diagonal elements.

Is there a more efficient way of proceeding?

I plan on using Eigen. Could you provide an implementation if the matrices are sparse (A can often be diagonal, B is often band-diagonal)?

yannick
  • 397
  • 3
  • 19

2 Answers2

5

If B is sparse, it may be efficient (i.e., O(n), assuming good condition number of B) to solve for x_i in

B x_i = a_i

(sample Conjugate Gradient code is given on Wikipedia). Taking a_i to be the column vectors of A, you get the matrix B^{-1} A in O(n^2). Then you can sum the diagonal elements to get the trace. Generally, it's easier to do this sparse inverse multiplication than to get the full set of eigenvalues. For comparison, Cholesky decomposition is O(n^3). (see Darren Engwirda's comment below about Cholesky).

If you only need an approximation to the trace, you can actually reduce the cost to O(q n) by averaging

r^T (A B^{-1}) r

over q random vectors r. Usually q << n. This is an unbiased estimate provided that the components of the random vector r satisfy

< r_i r_j > = \delta_{ij}

where < ... > indicates an average over the distribution of r. For example, components r_i could be independent gaussian distributed with unit variance. Or they could be selected uniformly from +-1. Typically the trace scales like O(n) and the error in the trace estimate scales like O(sqrt(n/q)), so the relative error scales as O(sqrt(1/nq)).

Kipton Barros
  • 21,002
  • 4
  • 67
  • 80
  • Thanks for your answer. How do you do the averaging with r ? from what you write, it seems like you need to compute A.B^{-1} which is probably not what you wanted to say. – yannick Sep 22 '11 at 08:10
  • Kipton probably means you should compute r^T A B^{-1} r by first solving B x = r and then computing r^T A x. But I do not see how he gets a cost of O(n) for the probabilistic approach: solving n systems with cost O(n) each gives a cost of O(n^2). Perhaps the number of random vectors can be taken smaller than n = size of A? – Jitse Niesen Sep 22 '11 at 08:38
  • 1
    @Kipton Barros: The complexity for a *sparse* Cholesky factorisation is definitely not `O(N^3)` - that would be a dense factorisation, the sparse case is typically much, much faster. Using a `PCG` solver to solve `N` linear systems would result in *at least* `O(N|B|)` complexity, which would only be `O(N^2)` in the case of diagonal `B`. I would probably benchmark against a "good" sparse factorisation package, perhaps `CHOLMOD`. – Darren Engwirda Sep 22 '11 at 21:57
  • @Darren, Thanks for the correction about Cholesky decomposition. – Kipton Barros Sep 22 '11 at 22:09
1

If generalized eigenvalues are more efficient to compute, you can compute the generalized eigenvalues, A*v = lambda* B *v and then sum up all the lambdas.

Chris A.
  • 6,817
  • 2
  • 25
  • 43