0

I have n y variables with 100 rows each. To resample from 1 to nrows, the following code it gives the expected result, but its is tedious and impractical. To reproduce the situation, lets suposse that y has 5 rows:

y<-rnorm(n=5, mean=10, sd=2)
R=1000 #number of resamplings
boot.means = numeric(R)
for (i in 1:R) { boot.sample = sample(y, 1, replace=T)
boot.means[i] = mean(boot.sample) }
m1<-mean(boot.means)  
d1<-sd(boot.means)  
cv1 =(d1*100)/m1  

R=1000 #number of resamplings
boot.means = numeric(R)
for (i in 1:R) { boot.sample = sample(y, 2, replace=T)
boot.means[i] = mean(boot.sample) }
m2<-mean(boot.means)  
d2<-sd(boot.means)  
cv2 =(d2*100)/m2  

R=1000 #number of resamplings
boot.means = numeric(R)
for (i in 1:R) { boot.sample = sample(y, 3, replace=T)
boot.means[i] = mean(boot.sample) }
m3<-mean(boot.means)  
d3<-sd(boot.means)  
cv3 =(d3*100)/m3  


R=1000 #number of resamplings
boot.means = numeric(R)
for (i in 1:R) { boot.sample = sample(y, 4, replace=T)
boot.means[i] = mean(boot.sample) }
m4<-mean(boot.means)  
d4<-sd(boot.means)  
cv4 =(d4*100)/m4


R=1000 #number of resamplings
boot.means = numeric(R)
for (i in 1:R) { boot.sample = sample(y, 5, replace=T)
boot.means[i] = mean(boot.sample) }
m5<-mean(boot.means)  
d5<-sd(boot.means)  
cv5 =(d5*100)/m5

CV.OK<-(c(cv1,cv2,cv3,cv4,cv5))
plot(CV.OK)

I would like to use something like the following code, but it gives unexpected results. Please, somebody could helpme. Thanks.

R = 1000  #number of resamplings
boot.sample=seq(1,5, by=1)
boot.means = numeric(R)
boot.sd = numeric(R)
m = 5
d = 5
for (i in 1:5) {
  for (j in 1:R) {
    boot.sample[i] = sample(y, i, replace=T)
    boot.means[j] = mean(boot.sample[i])
    boot.sd[j] = sd(boot.sample[i])
    m[i]=mean(boot.means[j])  
    d[i]=mean(boot.sd[j]) 
  }
}
CV.Fail<-(d*100)/m 
StupidWolf
  • 45,075
  • 17
  • 40
  • 72

2 Answers2

0

I think you want this:

y<-rnorm(n=5, mean=10, sd=2)
R = 1000  #number of resamplings
CVs <- numeric(5)
for (i in 1:5) {
  boot.means = numeric(R)
  for (j in 1:R) {
    boot.sample = sample(y, i, replace=T)
    boot.means[j] = mean(boot.sample)
  }
  m=mean(boot.means)  
  d=sd(boot.means) 
  CVs[i] = (d*100)/m 
}
plot(CVs)
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
0

in R, you should try to avoid loops since they are pretty slow. I hope I understood the problem correctly and wrote a bit of a function that should get you started from a different point.

library(plyr)
library(dplyr)

# dummy data set
data_set = data.frame(value = runif(200), group = rep(c("a", "b"), each=100))

# create a function that takes the sample size as an argument
iterative_sample = function(sample_size, data){
# group the data (your 'n' equals the number of groups-
# here thats 'a' and 'b'
  sample_temp = dplyr::group_by(data, group) %>%
    # take x (sample size) samples from each group 
    sample_n(sample_size, replace=T) %>%
    # compute summary stats for each group
    summarize(mean = mean(value), sd = sd(value)) %>%
    # attach the sample size to keep track 
    mutate(sample_size = sample_size)
  # we must return a dataframe to uses ldply later on
  return(sample_temp)
}

# thats the vector we are going to iterate over using ldply
sample_vect = c(1:2)

# ldplyr (plyr package) takes a list or vector and returns a dataframe and our custom
# function -checkout the manpage 
# ?ldply

# ...
#
#
#    .data: list to be processed
#
#     .fun: function to apply to each piece
#
#      ...: other arguments passed on to ‘.fun’
#
# ...
#

ldply(.data = sample_vect, .fun = iterative_sample, data_set)
sluedtke
  • 314
  • 1
  • 7
  • Yes, it works as I expected. But, a question that remains is where it must be specified the number of bootstrapping, i.e 1000. – Walter Pereira Oct 18 '16 at 01:39
  • That is supposed to be specified via the `sample_vect` variable. So if you go for `sample_vect = c(1:100)` it will eventually take up to 100 samples and compute the summary statistics. – sluedtke Oct 18 '16 at 09:33