0

I am trying to run a correlation test between different columns within a table. I also use the bootstrap method to run the same test. I want to compare the result but found out that those are exactly the same result. So I am wondering is there anything I did it wrong.

df is a 20000 row * 7 column data.table, the first column is key

Below is my bootstrap code. Please help me to check it. Is that possible that the result after the bootstrap will be same as run the whole dataset? Thank you!

n = nrow(df)
cor.small <- function(d,i= c(1:n)){
 d2 <- d[i,]
 cormat <- cor(d2[,-1,with=FALSE])
 upper <- get_upper_tri(cormat)
 return(upper)
}

result <- boot(data = df,statistic = cor.small, R= 999)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
VeraShao
  • 63
  • 8
  • 2
    I suspect you don't know why we use bootstrapping. I also suspect your problem is that you don't understand what the output of `boot` means. Anyway, a quick check (after some googling and guessing how `get_upper_tri` might be defined) indicates that your code works as expected. – Roland Oct 25 '17 at 06:16
  • The boostrap estimates would be `colMeans(result$t)`. Read the help page, section `Value`: `t0 The observed value of statistic applied to data.` And `cor(df[, -1])` is the matrix you are getting (without the `NA` values). Also, get rid of `=c(1:n)` since this is the `indices` argument and should vary from call to call. – Rui Barradas Oct 25 '17 at 09:43

1 Answers1

1

You should call the boot function like this (I have used the Iris dataset to work with something, and modified the code a bit in places) :

cor.small <- function(d, i ){
        cormat <- cor(d[i ,-1])
        upper <- cormat[lower.tri(cormat)]
        return(upper)
}

df <- iris[ ,-5]
nsamp = ceil(nrow(df) / 2) # or use a different value
nrun = 10
set.seed(1)
cor.small(df,sample(1:nsamp,nsamp,replace=TRUE))


boot(data = df,statistic = cor.small, R= nrun)
knb
  • 9,138
  • 4
  • 58
  • 85