get wrong inverse matrix from both solve() and ginv() functions

Question

I try to calculate the inverse of a covariance matrix in R. I use both solve and ginv functions however I can not find the identity matrix properly. I am wondering why is happening. Had anyone met this problem before? How to fix it?

d <- data.frame(r1,r2,r3,r4)
ad <- cov(d) # get covariance

You can access ad by:

ad <- structure(c(0.000564103047135273, 0.000209426917735389, 
 -2.50601812852379e-07, 0.000318692159506722, 0.000209426917735389, 
 8.92756413718721e-05, -9.42226640483041e-08, 0.000135853820739977, 
 -2.50601812852379e-07, -9.42226640483041e-08, 1.12190005351388e-10, 
 -1.43381875666865e-07, 0.000318692159506722, 0.000135853820739977, 
 -1.43381875666865e-07, 0.000206733441799332), 
  .Dim = c(4L, 4L), 
  .Dimnames = list(c("r1", "r2", "r3", "r4"), 
                   c("r1", "r2", "r3", "r4")))

#       r1            r2            r3            r4
#r1  5.641030e-04  2.094269e-04 -2.506018e-07  3.186922e-04
#r2  2.094269e-04  8.927564e-05 -9.422266e-08  1.358538e-04
#r3 -2.506018e-07 -9.422266e-08  1.121900e-10 -1.433819e-07
#r4  3.186922e-04  1.358538e-04 -1.433819e-07  2.067334e-04

a <- ginv(ad,tol = 1e-18) `# calculate the inverse of the matrix by ginv` 
#         [,1]         [,2]         [,3]        [,4]
#[1,] 2.369507e+05     7332.961 5.497029e+08    11158.82
#[2,] 7.332964e+03     9194.958 4.198473e+07    13992.28
#[3,] 5.497029e+08 41984725.523 1.353713e+12 63889594.47
#[4,] 1.115881e+04    13992.283 6.388959e+07    21292.54

b <- solve(ad,tol = 1e-18) # calculate the inverse of the matrix by solve   
#         r1            r2            r3            r4
#r1    236950.7  1.701817e+06     549702919 -1.099090e+06
#r2   1184944.4 -2.245684e+20    2905309940  1.475740e+20
#r3 549702919.1  4.213828e+09 1353713017167 -2.677616e+09
#r4   -762702.5  1.475740e+20   -1817729881 -9.697747e+19

ad%*%a
#  r1         r2          r3           r4
#r1  1.000000e+00 3.552714e-15 4.729372e-11 -8.881784e-16
#r2  9.547918e-15 3.015976e-01 5.820766e-11  4.589515e-01
#r3 -2.385245e-18 3.234236e-12 1.000000e+00 -2.125362e-12
#r4 -8.881784e-15 4.589515e-01 6.002665e-11  6.984024e-01

ad%*%b
#  r1    r2          r3           r4
#r1 1.000000e+00  0 0.000000e+00  0.000000000
#r2 1.421085e-14  0 2.910383e-11  0.000000000
#r3 0.000000e+00  0 1.000000e+00 -0.001953125
#r4 2.842171e-14 -4 0.000000e+00  4.000000000

@ZheyuanLi, > >dput(ad) structure(c(0.000564103047135273, 0.000209426917735389, -2.50601812852379e-07, 0.000318692159506722, 0.000209426917735389, 8.92756413718721e-05, -9.42226640483041e-08, 0.000135853820739977, -2.50601812852379e-07, -9.42226640483041e-08, 1.12190005351388e-10, -1.43381875666865e-07, 0.000318692159506722, 0.000135853820739977, -1.43381875666865e-07, 0.000206733441799332), .Dim = c(4L, 4L), .Dimnames = list(c("r1", "r2", "r3", "r4"), c("r1", "r2", "r3", "r4"))) — user3710854, Jul 24 '16 at 15:21
Run `eigen(as.matrix(ad))` and look at the eigenvalues. Your system is not well conditioned (1.49e+15). — Matthew Lundberg, Jul 24 '16 at 15:28
@MatthewLundberg `> eigen(as.matrix(ad))` `$values` `[1] 8.342751e-04 2.583717e-05 7.387089e-13 1.765580e-20` `$vectors` [,1] [,2] [,3] [,4] `[1,] -0.8159716923 0.5780917163 -4.060705e-04 0.000000e+00` `[2,] -0.3174758816 -0.4481146117 -3.101449e-05 -8.357047e-01` `[3,] 0.0003639893 -0.0001886646 -9.999999e-01 -7.738135e-13` `[4,] -0.4831139922 -0.6819114508 -4.719581e-05 5.491791e-01` — user3710854, Jul 24 '16 at 16:07

IRTFM · Answer 1 · 2017-02-21T18:09:15.260

Highly collinear variables are a threat to numerical stability in matrix calculations at the heart of regression methods. One way of assessing the degree of multicollinearity is to calculate the variance inflation factors (VIFs). This code is extracted from the vif function in Harrell's rms package:

library(rms)
dput(ad)
structure(c(0.0046716674, 0.0017256716, -0.0001918083, 0.0027385111, 
0.001725672, 0.00073037, -8.1816e-05, 0.001159042, -0.0001918083, 
-8.1816e-05, 9.169343e-06, -0.0001298359, 0.0027385111, 0.0011590423, 
-0.0001298359, 0.0018393131), .Dim = c(4L, 4L), .Dimnames = list(
    c("r1", "r2", "r3", "r4"), c("r1", "r2", "r3", "r4")))

 nam <- dimnames(ad)[[1]]
 d <- diag(ad)^0.5
    vif.vals <- diag(solve(ad/(d %o% d)))
    names(vif.vals) <- nam
    vif.vals
#---
          r1           r2           r3           r4 
6.535378e+01 3.338701e+06 1.768640e+04 3.389362e+06

The rule of thumb is to be worried when you see VIFs above 10. On that basis your VIF are astronomical. (After using the original values from you comment the maximal VIFs is even higher with two values of 1.185158e+14 and they remain all much higher than 10.)

'Single-figure of merit' tests are available via the condition number -- and kappa -- in base R, so you can get to it without Hmisc. See eg [this old answer of mine](http://stackoverflow.com/a/3431963/143305). — Dirk Eddelbuettel, Jul 24 '16 at 15:44
Helpful indeed. Although I didn't actually use `Hmisc::vif` except as a code reference. — IRTFM, Jul 24 '16 at 16:49

get wrong inverse matrix from both solve() and ginv() functions

1 Answers1