1

I am trying to optimize a function (NbCluster) to be able to handle bigger matrices, something I have posted about previously: Reducing NbClust memory usage This led me to create a (still very experimental) fork of NbClust, which is here: https://github.com/jbhanks/BigNbClust

One of the main bottlenecks is the use of the var() function, so I have replaced it with cova in Rfast. The results are not exactly identical, and I need to figure out if they are close enough to be used interchangeably. Might other cases arise where the differences are bigger?

> bigm <- matrix(rnorm(1000*1000,mean=0,sd = 3), 1000, 1000)
> v <- var(bigm)
> cvm <- cova(bigm)
> sum(v != cvm)
[1] 954579
> sum(v == cvm)
[1] 45421
> cor(c(v), c(cvm), method = "pearson")
[1] 1
> cor(c(v), c(cvm), method = "spearman")
[1] 1
> diff = v - cvm
> mean(diff)
[1] -4.557742e-19
> max(diff)
[1] 2.4869e-14
> bigm <- matrix(rnorm(10000*10000,mean=0,sd = 3), 10000, 10000)
> v <- var(bigm)
> cvm <- cova(bigm)
> sum(v != cvm)
[1] 97986031
> sum(v == cvm)
[1] 2013969
> cor(c(v), c(cvm), method = "pearson")
[1] 1
> cor(c(v), c(cvm), method = "spearman")
[1] 1
> diff = v - cvm
> mean(diff)
[1] -3.875792e-20
> max(diff)
[1] 9.05942e-14

However in some real world situations (unfortunately I cannot share the actual data), cova throws an error while var does not.

Error in sqrt(n) : non-numeric argument to mathematical function

This appears to be the result of cova getting a vector instead of a matrix in select cases at specific loop iterations (cova gets the dim() at an early step). I fixed it by always coercing the object to a matrix, but I'm still worried that my changes might have unintended consequences. I can't say I truly grasp the inner workings of the function, I just replaced a few functions with ones that I understand to be equivalent.

Stonecraft
  • 860
  • 1
  • 12
  • 30

1 Answers1

0

cova and var give you the same results, you saw it from the average difference. The "sum(v != cvm)" is reasonable not to give zero. The results are not identical to all decimal places.

Mike
  • 106
  • 5