1

I am having problems with hd.eigen in Rfast. It gives extremely close results to eigen with most data, but sometimes hd.eign returns an empty $vector, NAs, or other undesirable results. For example:

> set.seed(123)
> bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
> 
> e3 = eigen(bigm)
> length(e3$values)
[1] 2000
> length(e3$vectors)
[1] 4000000
> sum(is.na(e3$vectors) == TRUE)
[1] 0
> sum(is.na(e3$vectors) == FALSE)
[1] 4000000
> 
> e4 = hd.eigen(bigm, vectors = TRUE)
> length(e4$values)
[1] 2000
> length(e4$vectors)
[1] 4000000
> sum(is.na(e4$vectors) == TRUE)
[1] 2000
> sum(is.na(e4$vectors) == FALSE)
[1] 3998000

Other than the fact that it breaks my script, do these NAs indicate a deeper problem with my data? Or is hd.eig not able to handle some situations that the stock eigen() can? Is one better than the other?

Edit: As per Ralf's suggestion, I checked my BLAS versions, and it does seem like maybe R is looking for the wrong version/in the wrong place:

~ $ ldd /usr/lib64/R/bin/exec/R
        linux-vdso.so.1 (0x00007ffeec3b9000)
        libR.so => not found
        libRblas.so => not found
        libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007feb27ef2000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007feb27ecf000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007feb27cdb000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007feb27f7b000)

Also, I am unclear on whether openBLAS is equivalent to the BLAS that is installed by default in other distros.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-generic-linux-gnu (64-bit)
Running under: Clear Linux OS

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas_nehalemp-r0.3.6.so

edit 2: I tried the same example on a CentOS-based HPC system, and did not get any NA's. There, sessionInfo() reveals:

BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

Edit 3: The expression in hd.eign that produces the NAs is

vectors <- tcrossprod(y, t(FF) * L^(-0.5))

specifically, L^(-0.5) produces NaN at index 2000

> L[2000]
[1] -1.136237e-12

However, on the two machines where no NAs are returned L[2000] is positive (although slightly different, 5.822884e-14 on the HPC system and 3.022511e-12 on my Windows machine running Microsoft build of R.

Edit 4: The difference appears to originate in the base eigen() function, which returns one negative value from the crossprod() matrix xx on the problem machine but not not the other two. I saved the xx object and opened between computers, so I know that the input to eigen() was exactly the same.

Edit 5: I drilled one level deeper and found that the original negative value comes from this statement in eigen()

    z <- if (!complex.x) 
      .Internal(La_rs(x, only.values))
    else .Internal(La_rs_cmplx(x, only.values))

Edit 6: If I save as a CSV and then re-open, the problem computer does not produce negative eigenvalues.

> load("/home/james/nfs-cloud/PanosLab/CircRNA/input_to_La_rs.Rdata")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 1
> write.csv(x, "test_for_internal.csv", row.names = FALSE)
> x <- read.csv("test_for_internal.csv")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 0

Does that give anyone a clue? Is this a bug?

Stonecraft
  • 860
  • 1
  • 12
  • 30
  • 1
    I cannot reproduce your findings using R 3.6.1 and Rfast 1.9.5 on Debian Linux. – Ralf Stubner Sep 15 '19 at 21:07
  • Interesting. I am also using R 3.6.1 and Rfast 1.9.5, but on Clear Linux. I will try on a different computer when I have a chance. – Stonecraft Sep 15 '19 at 22:01
  • You could also try with one of the docker containers provided by the rocker project. BTW, I am using OpenBLAS for BLAS /LAPCK and gcc version 9.2.1. – Ralf Stubner Sep 16 '19 at 05:58
  • I tried it on a different VM (Microsoft R for Windows + Rfast). And indeed, `hd.eigen()` produced no NAs. I wonder what's going on with my installation. – Stonecraft Sep 16 '19 at 09:04
  • However when I run the same code on Ubuntu 18.04 (also R 3.6 and Rfast 1.9.5) I get the NAs again. What could be going on here? – Stonecraft Sep 16 '19 at 15:09
  • Which BLAS/LAPACK do you use? – Ralf Stubner Sep 16 '19 at 15:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199533/discussion-between-stonecraft-and-ralf-stubner). – Stonecraft Sep 16 '19 at 17:10
  • According to `sessioninfo()`: BLAS/LAPACK: /usr/lib64/libopenblas_nehalemp-r0.3.6.so However I think some links might be broken because: ```ldd /usr/lib64/R/bin/exec/R linux-vdso.so.1 (0x00007ffeec3b9000) libR.so => not found libRblas.so => not found ``` – Stonecraft Sep 17 '19 at 00:06
  • 1
    It is not untypical that `libRblas.so` is not installed in a standard location. That's why the script that actually starts the R executable sets `LD_LIBRARY_PATH`. Anyway, I am out of ideas. One way forward would be a way to reproduce the issue, e.g. via a docker container. – Ralf Stubner Sep 17 '19 at 07:37
  • Ok thanks. I since it morphed into a different question, I created a new thread here: https://stackoverflow.com/questions/57967678/internalla-rs-returns-negative-values-in-on-some-installations – Stonecraft Sep 17 '19 at 07:58
  • It is well known that overdoing it with optimization flags *can* break numerical algorithms. This is why you shouldn't lightly use -ffast-math for example, as this allows the compiler to make some transformations that can break accuracy. Maybe such an optimization was, unfortunately, done in Clear Linux. Maybe you can document the matrix where ClearLinux fails to find Eigenvectors, so this can be fixed eventually. – Has QUIT--Anony-Mousse Sep 19 '19 at 18:01

1 Answers1

0

The hd.eigen function in Rfast is designed for the case of n smaller than p only.

Mike
  • 106
  • 5