1

This is a continuation from a previous question: Rfast hd.eigen() returns NAs but base eigen() does not

I have been having a problem with .Internal(La_rs((x)) returning different results on different machines.

I suspect it may have something to do with number formatting, because on the same machine, if I save as a CSV and re-open, I don't get negatives anymore:

On Clear Linux install:

> load("input_to_La_rs.Rdata")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 1
> write.csv(x, "test_for_internal.csv", row.names = FALSE)
> x <- read.csv("test_for_internal.csv")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 0

However on my Windows install (and on a CentOS based HPC setup), I can open the rdata file directly and don't get negative values:

> load("input_to_La_rs.Rdata")
> r <- .Internal(La_rs(x, only.values=TRUE))
> sum(r$values < 0)
[1] 0

Is this related to different R builds/library versions? Some setting I don't know about? A bug?

Edit: here is an updated example. It seems to work inconsistently, even on the this particular install, sometimes I do get zero:

set.seed(123)
bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
m <- Rfast::colmeans(bigm)
y <- t(bigm) - m
xx <- crossprod(y)
x <- unname(as.matrix(xx))
b <- .Internal(La_rs(x, TRUE))
sum(b$values < 0)
# [1] 1

Yet another update: It turns out that the first difference creeps in with Rfast's colmeans producing slightly different results than base colMeans.

    set.seed(123)
    bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
    m <- colMeans(bigm)
    m <- colmeans(bigm)
    y <- t(bigm) - m
    xx <- crossprod(y)
    x <- unname(as.matrix(xx))
    b <- .Internal(La_rs(x, TRUE))
    sum(b$values < 0)
  # [1] 1

    m <- colMeans(bigm)
    y <- t(bigm) - m
    xx <- crossprod(y)
    x <- unname(as.matrix(xx))
    b <- .Internal(La_rs(x, TRUE))
    sum(b$values < 0)
Stonecraft
  • 860
  • 1
  • 12
  • 30
  • Can you provide a (small) data set that produces the issue? Did you install R from source or from some repository? – Ralf Stubner Sep 17 '19 at 09:06
  • I installed R from the Clear Linux repository. But I have the same issue on an a Xubuntu machine with R installed from CRAN. – Stonecraft Sep 18 '19 at 00:16
  • @RalfStubner ok, I added an example that is as close to minimal as I could get. – Stonecraft Sep 19 '19 at 08:11
  • 2
    I have tried R 3.6.1 on debian with OpenBLAS and reference BLAS/LAPACK as well as R devel with BLAS/LAPACK aas shipped by R. In none of these environments I see negative eigenvalues. However, `sum(eig2$vectors < 0)` is 4959 for OpenBLAS and 4969 for reference BLAS/LAPACK and R's BLAS/LAPACK. Maybe you can provide a `Dockerfile` that can be used to reproduce the issue? – Ralf Stubner Sep 19 '19 at 08:27
  • Yes, in the minimal example only the vectors are negative. With larger examples, I saw 1 negative eigenvalue. But it is the vectors that I am using so that is equally problematic. I've never done anything with docker before, so that would be a project. – Stonecraft Sep 19 '19 at 08:32
  • 1
    I no longer understand the problem. Negative components in the eigen vectors are expected! – Ralf Stubner Sep 19 '19 at 08:46
  • 1
    @RalfStubner I updated with a better example, this one does give a negative eigenvalue. – Stonecraft Sep 19 '19 at 09:45
  • 2
    The matrix `x` is almost rank deficient, i.e. the smallest eigen value is very close to zero. It might be that the numerical algorithm got unstable and produced a small negative value. – Ralf Stubner Sep 19 '19 at 13:34

2 Answers2

1

The hd.eigen function in Rfast works only, only for the case of n < p, i.e. when the rows are less than the columns. In the help page of the hd.eigen function is the reference to the paper that suggested this algorithm. The algorithm I do not think works for any other case. Perhaps that is why you get NAs.

Rfast2 contains a function called "pca" that works for either case, np. Try that one also. Inside there, an SVD is effectively performed calling "svd" from R.

Mike
  • 106
  • 5
0

I think the difference is rather negligible. Average difference of the matrix elements equal to 10^(-12) or less are actually zero.

Mike
  • 106
  • 5
  • They are tiny, but they are the difference between negative and non-negative eigenvalues being produced downstream, eventually resulting in NAs being produced by `hd.eigen`. So is the "problem", if there is one, the tiny difference in column means, the fact that `La_rs` returns a negative eigenvalue, or that hd.eigen converts the negatives to NAs? – Stonecraft Sep 26 '19 at 17:32