My intention was to write several functions aimed at finding the overall similarity between two covariance matrices, either by multiplying them with random vectors and correlating the response vectors or by bootstrapping one of the matrices to obtain the correlation coefficient distribution that can serve for comparison. But in both cases I got erroneous results. The observed between-matrix correlation was high up to 0.93, but the distribution only ranged up to 0.2 the most. This is the function`s code:
resamplerSimAlt <- function(mat1, mat2, numR, graph = FALSE)
{
statSim <- numeric(numR)
mat1vcv <- cov(mat1)
mat2vcvT <- cov(mat2)
ltM1 <- mat1vcv[col(mat1vcv) <= row(mat1vcv)]
ltM2T <- mat2vcvT[col(mat2vcvT) <= row(mat2vcvT)]
statObs <- cor(ltM1, ltM2T)
indice <- c(1:length(mat2))
resamplesIndices <- lapply(1:numR, function(i) sample(indice, replace = F))
for (i in 1:numR)
{
ss <- mat2[sample(resamplesIndices[[i]])]
ss <- matrix(ss, nrow = dim(mat2)[[1]], ncol = dim(mat2)[[2]])
mat2ss <- cov(ss)
ltM2ss <- mat2ss[col(mat2ss) <= row(mat2ss)]
statSim[i] <- cor(ltM1, ltM2ss)
}
if (graph == TRUE)
{
plot(1, main = "resampled data density distribution", xlim = c(0, statObs+0.1), ylim = c(0,14))
points(density(statSim), type="l", lwd=2)
abline(v = statObs)
text(10, 10, "observed corelation = ")
}
list( obs = statObs , sumFit = sum(statSim > statObs)/numR)
}
In fact it is hard for me to belive that correlation coefficient between two original matrices is high, and the one between the first original matrix and the second re-sampled one is maximal 0.2 after 10000 bootstrap repetitions.
Any comments on the validity of the code?