6

ginv() function from MASS package in R produce totally different values compared to MATLAB pinv() function. They both claim to produce Moore-Penrose generalized inverse of a matrix.

I tried to set the same tolerance for the R implementation but the difference persists.

  • MATLAB default tol : max(size(A)) * norm(A) * eps(class(A))
  • R default tol : sqrt(.Machine$double.eps)

Reproduction:

R:

library(MASS)
A <- matrix(c(47,94032,149, 94032, 217179406,313679,149,313679,499),3,3)
ginv(A)

outputs:

              [,1]          [,2]          [,3]
[1,]  1.675667e-03 -8.735203e-06  5.545605e-03
[2,] -8.735203e-06  5.014084e-08 -2.890907e-05
[3,]  5.545605e-03 -2.890907e-05  1.835313e-02

svd(A)

outputs:

$d
[1] 2.171799e+08 4.992800e+01 2.302544e+00

$u
              [,1]         [,2]          [,3]
[1,] -0.0004329688  0.289245088 -9.572550e-01
[2,] -0.9999988632 -0.001507826 -3.304234e-06
[3,] -0.0014443299  0.957253888  2.892454e-01

$v
              [,1]         [,2]          [,3]
[1,] -0.0004329688  0.289245088 -9.572550e-01
[2,] -0.9999988632 -0.001507826 -3.304234e-06
[3,] -0.0014443299  0.957253888  2.892454e-01

MATLAB:

A = [47 94032 149; 94032 217179406 313679; 149 313679 499]
pinv(A)

outputs:

ans =

    0.3996   -0.0000   -0.1147
   -0.0000    0.0000   -0.0000
   -0.1147   -0.0000    0.0547

svd:

[U, S, V] = svd(A)

U =

   -0.0004    0.2892   -0.9573
   -1.0000   -0.0015   -0.0000
   -0.0014    0.9573    0.2892


S =

  1.0e+008 *

    2.1718         0         0
         0    0.0000         0
         0         0    0.0000


V =

   -0.0004    0.2892   -0.9573
   -1.0000   -0.0015   -0.0000
   -0.0014    0.9573    0.2892

Solution: to make R ginv like MATLAB pinv use this function:

#' Pseudo-Inverse of Matrix
#' @description
#' This is the modified version of ginv function in MASS package.
#' It produces MATLAB like pseudo-inverse of a matrix
#' @param X The matrix to compute the pseudo-inverse
#' @param tol The default is the same as MATLAB pinv function
#'
#' @return The pseudo inverse of the matrix
#' @export
#'
#' @examples
#' A <- matrix(1:6,3,2)
#' pinv(A)
pinv <- function (X, tol = max(dim(X)) * max(X) * .Machine$double.eps)
{
  if (length(dim(X)) > 2L || !(is.numeric(X) || is.complex(X)))
    stop("'X' must be a numeric or complex matrix")
  if (!is.matrix(X))
    X <- as.matrix(X)
  Xsvd <- svd(X)
  if (is.complex(X))
    Xsvd$u <- Conj(Xsvd$u)
  Positive <- any(Xsvd$d > max(tol * Xsvd$d[1L], 0))
  if (Positive)
    Xsvd$v %*% (1 / Xsvd$d * t(Xsvd$u))
  else
    array(0, dim(X)[2L:1L])
}
Farid Cheraghi
  • 847
  • 2
  • 12
  • 23
  • The condition number (`cond`) for that matrix is quite large indicating that it is close to singular. But that in and of itself shouldn't explain why the results differ by so much. – horchler Apr 03 '16 at 22:00
  • Confirming that [Wolfram Alpha agrees with MATLAB](http://www.wolframalpha.com/input/?i=PseudoInverse%5B%7B%7B47.,+94032,+149%7D,%7B+94032,+217179406,+313679%7D,%7B+149,+313679,+499%7D%7D%5D) – MichaelChirico Apr 03 '16 at 22:22
  • 1
    Following from the definition, `A %*% ginv(A) %*% A` in `R` closely resembles `A`, while the `MATLAB` result produces something very different from `A`, it seems. – nicola Apr 03 '16 at 22:24
  • Actually, if you give `ginv(A,tol=0)`, the results are close. The `MATLAB` result might be correct and not printed with all the digits. – nicola Apr 03 '16 at 22:27
  • I found the source of the discrepancy, see final edit. – MichaelChirico Apr 03 '16 at 22:47
  • You can also use R's pracma::pinv(A), that one does give the same result as Matlab... – Tom Wenseleers Sep 01 '19 at 19:29

1 Answers1

6

Running debugonce(MASS::ginv), we see that the difference lies in what is done with the singular value decomposition.

Specifically, R checks the following:

Xsvd <- svd(A)
Positive <- Xsvd$d > max(tol * Xsvd$d[1L], 0)
Positive
# [1]  TRUE  TRUE FALSE

If the third element were true (which we can force by setting tol = 0, as suggested by @nicola), MASS::ginv would return:

Xsvd$v %*% (1/Xsvd$d * t(Xsvd$u))
#               [,1]          [,2]          [,3]
# [1,]  3.996430e-01 -7.361507e-06 -1.147047e-01
# [2,] -7.361507e-06  5.014558e-08 -2.932415e-05
# [3,] -1.147047e-01 -2.932415e-05  5.468812e-02

(i.e., the same as MATLAB).

Instead, it returns:

Xsvd$v[, Positive, drop = FALSE] %*% ((1/Xsvd$d[Positive]) * 
  t(Xsvd$u[, Positive, drop = FALSE]))
#               [,1]          [,2]          [,3]
# [1,]  1.675667e-03 -8.735203e-06  5.545605e-03
# [2,] -8.735203e-06  5.014084e-08 -2.890907e-05
# [3,]  5.545605e-03 -2.890907e-05  1.835313e-02

Thanks to @FaridCher for pointing out the source code of pinv.

I'm not sure I 100% understand the MATLAB code, but I think it comes down to a difference in how tol is used. The MATLAB correspondence to Positive in R is:

`r = sum(s>tol)`

Where tol is what's supplied by the user; if none is supplied, we get:

m = 0;
% I don't get the point of this for loop -- why not just `m = max(size(A))`?
for i = 1:n 
   m = max(m,length(A(:,i)));
end
% contrast with simply `tol * Xsvd$d[1L]` in R
%   (note: i believe the elements of d are sorted largest to smallest)
tol = m*eps(max(s)); 
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • 2
    I guess it's a tolerance issue from the `R` side and a printing one in `MATLAB`. As I said in the comments, `ginv(A,tol=0)` agrees with wolphram and `all.equal(A,A %*% ginv(A,tol=0) %*% A)` gives `TRUE`, so the result is correct. It appears that the pseudo-inverse is very sensitive to the actual values; little differences in `A` may generate a complete different pseudo-inverse. – nicola Apr 03 '16 at 22:34
  • @MichaelChirico [pinv source code](https://people.sc.fsu.edu/~jburkardt/m_src/chebfun/@chebfun/pinv.m) is available, although svd's not! – Farid Cheraghi Apr 03 '16 at 22:41
  • @FaridCher thanks, that solves the issue! See edited answer. – MichaelChirico Apr 03 '16 at 22:46
  • @MichaelChirico Thanks! However people should stop saying that [these two functions are equivalent](http://mathesaurus.sourceforge.net/octave-r.html). They are not the same, yet with the modification you made to ginv, they become so. – Farid Cheraghi Apr 03 '16 at 23:00
  • @FaridCher not quite. I noticed the `V(:,1:r)` bit in the `MATLAB` code. That seems odd... I don't know anything about SVD, is `s` guaranteed to be in decreasing order? – MichaelChirico Apr 03 '16 at 23:02
  • @FaridCher what is `eps(217179900)` in MATLAB? – MichaelChirico Apr 03 '16 at 23:07
  • @FaridCher Interesting... that's inconsistent with their statement that `tol` defaults to `max(size(A))*norm(A)*eps`... – MichaelChirico Apr 03 '16 at 23:08
  • @MichaelChirico I am acting as your MATLAB compiler :) heh! `max(size(A)) * norm(A) * eps(class(A))` defaults to `1.4467e-007`. `eps('double')` results in `2.2204e-016`. Octave is very similar to MATLAB. Maybe you can further analyse the case in Octave if you are interested! – Farid Cheraghi Apr 03 '16 at 23:15
  • 1
    You can also use R's pracma::pinv(A), that one does give the same result as Matlab... – Tom Wenseleers Sep 01 '19 at 19:29