2

I try to calculate the inverse of a covariance matrix in R. I use both solve and ginv functions however I can not find the identity matrix properly. I am wondering why is happening. Had anyone met this problem before? How to fix it?

d <- data.frame(r1,r2,r3,r4)
ad <- cov(d) # get covariance

You can access ad by:

ad <- structure(c(0.000564103047135273, 0.000209426917735389, 
 -2.50601812852379e-07, 0.000318692159506722, 0.000209426917735389, 
 8.92756413718721e-05, -9.42226640483041e-08, 0.000135853820739977, 
 -2.50601812852379e-07, -9.42226640483041e-08, 1.12190005351388e-10, 
 -1.43381875666865e-07, 0.000318692159506722, 0.000135853820739977, 
 -1.43381875666865e-07, 0.000206733441799332), 
  .Dim = c(4L, 4L), 
  .Dimnames = list(c("r1", "r2", "r3", "r4"), 
                   c("r1", "r2", "r3", "r4")))

#       r1            r2            r3            r4
#r1  5.641030e-04  2.094269e-04 -2.506018e-07  3.186922e-04
#r2  2.094269e-04  8.927564e-05 -9.422266e-08  1.358538e-04
#r3 -2.506018e-07 -9.422266e-08  1.121900e-10 -1.433819e-07
#r4  3.186922e-04  1.358538e-04 -1.433819e-07  2.067334e-04

a <- ginv(ad,tol = 1e-18) `# calculate the inverse of the matrix by ginv` 
#         [,1]         [,2]         [,3]        [,4]
#[1,] 2.369507e+05     7332.961 5.497029e+08    11158.82
#[2,] 7.332964e+03     9194.958 4.198473e+07    13992.28
#[3,] 5.497029e+08 41984725.523 1.353713e+12 63889594.47
#[4,] 1.115881e+04    13992.283 6.388959e+07    21292.54

b <- solve(ad,tol = 1e-18) # calculate the inverse of the matrix by solve   
#         r1            r2            r3            r4
#r1    236950.7  1.701817e+06     549702919 -1.099090e+06
#r2   1184944.4 -2.245684e+20    2905309940  1.475740e+20
#r3 549702919.1  4.213828e+09 1353713017167 -2.677616e+09
#r4   -762702.5  1.475740e+20   -1817729881 -9.697747e+19

ad%*%a
#  r1         r2          r3           r4
#r1  1.000000e+00 3.552714e-15 4.729372e-11 -8.881784e-16
#r2  9.547918e-15 3.015976e-01 5.820766e-11  4.589515e-01
#r3 -2.385245e-18 3.234236e-12 1.000000e+00 -2.125362e-12
#r4 -8.881784e-15 4.589515e-01 6.002665e-11  6.984024e-01

ad%*%b
#  r1    r2          r3           r4
#r1 1.000000e+00  0 0.000000e+00  0.000000000
#r2 1.421085e-14  0 2.910383e-11  0.000000000
#r3 0.000000e+00  0 1.000000e+00 -0.001953125
#r4 2.842171e-14 -4 0.000000e+00  4.000000000
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • @ZheyuanLi, > >dput(ad) structure(c(0.000564103047135273, 0.000209426917735389, -2.50601812852379e-07, 0.000318692159506722, 0.000209426917735389, 8.92756413718721e-05, -9.42226640483041e-08, 0.000135853820739977, -2.50601812852379e-07, -9.42226640483041e-08, 1.12190005351388e-10, -1.43381875666865e-07, 0.000318692159506722, 0.000135853820739977, -1.43381875666865e-07, 0.000206733441799332), .Dim = c(4L, 4L), .Dimnames = list(c("r1", "r2", "r3", "r4"), c("r1", "r2", "r3", "r4"))) – user3710854 Jul 24 '16 at 15:21
  • 2
    Run `eigen(as.matrix(ad))` and look at the eigenvalues. Your system is not well conditioned (1.49e+15). – Matthew Lundberg Jul 24 '16 at 15:28
  • @MatthewLundberg `> eigen(as.matrix(ad))` `$values` `[1] 8.342751e-04 2.583717e-05 7.387089e-13 1.765580e-20` `$vectors` [,1] [,2] [,3] [,4] `[1,] -0.8159716923 0.5780917163 -4.060705e-04 0.000000e+00` `[2,] -0.3174758816 -0.4481146117 -3.101449e-05 -8.357047e-01` `[3,] 0.0003639893 -0.0001886646 -9.999999e-01 -7.738135e-13` `[4,] -0.4831139922 -0.6819114508 -4.719581e-05 5.491791e-01` – user3710854 Jul 24 '16 at 16:07

1 Answers1

5

Highly collinear variables are a threat to numerical stability in matrix calculations at the heart of regression methods. One way of assessing the degree of multicollinearity is to calculate the variance inflation factors (VIFs). This code is extracted from the vif function in Harrell's rms package:

library(rms)
dput(ad)
structure(c(0.0046716674, 0.0017256716, -0.0001918083, 0.0027385111, 
0.001725672, 0.00073037, -8.1816e-05, 0.001159042, -0.0001918083, 
-8.1816e-05, 9.169343e-06, -0.0001298359, 0.0027385111, 0.0011590423, 
-0.0001298359, 0.0018393131), .Dim = c(4L, 4L), .Dimnames = list(
    c("r1", "r2", "r3", "r4"), c("r1", "r2", "r3", "r4")))

 nam <- dimnames(ad)[[1]]
 d <- diag(ad)^0.5
    vif.vals <- diag(solve(ad/(d %o% d)))
    names(vif.vals) <- nam
    vif.vals
#---
          r1           r2           r3           r4 
6.535378e+01 3.338701e+06 1.768640e+04 3.389362e+06 

The rule of thumb is to be worried when you see VIFs above 10. On that basis your VIF are astronomical. (After using the original values from you comment the maximal VIFs is even higher with two values of 1.185158e+14 and they remain all much higher than 10.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487