0

I have the following matrix (let's call it df), for which I would like to create bootstrapped means and 95% confidence intervals for each column, due to the heavily 0 weighted distribution. I would like the mean and CI's to be added to the bottom of the matrix as new rows. This is a small subset of the data, the true data has >600 rows which will make the bootstrapping much more effective.

row.names   V183    V184    V185    V186    V187    V188    V189    V190    V191    V192    V193    V194    V195    V196    V197    V198    V199    V200    V201    V202    V203    V204    V205
1   0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  NA  NA
2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0   NA  NA  NA  NA  NA  NA
4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308
5   0   0   0   0   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0   0   0   0   0   0   0   0   0
6   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0   0   0   0
7   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
8   0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0   0   0   0   0   0   0   0   0   0   0   0
9   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA
10  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806
12  0   0   0   0   0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
13  0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
14  0   0   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0   0   0   0   0   0

I have had success creating bootstrapped values for a single column, but have not been successful creating a for () loop that will populate an entire row of bootstrapped values for the matrix

The following is my code for a single row.

dfsub<-df[,1]
mean.boot <- function(dfsub, d) {
E=dfsub[d,]
return(mean(E, na.rm=T))}
b = boot(dfsub, mean.boot, R=1000)
b

Any thoughts? Would a for loop or an apply fn work better?

Also, the output for the booted values gives an original value and a bias, but where is the actual bootstrapped mean?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
ctlamb
  • 131
  • 1
  • 4
  • 14
  • 1
    in general ```apply```/```sapply``` etc are a lot faster then ```for``` loops in r – phonixor Jul 07 '14 at 13:26
  • What was the problem with this solutions for your [previous question](http://stackoverflow.com/questions/24598896/creating-bootstrapped-means-and-ci-from-matrix)? Is this substantially different? – MrFlick Jul 07 '14 at 14:11
  • It would help if you included **R** code that would reproduce "**df**" (if only approximately) and then paste only the first few rows of said data frame (and get rid of the current df-output you have now). – Steve S Jul 07 '14 at 15:14

1 Answers1

2

This is a somewhat confusing question as I'm not sure whether you're bootstrapping by row or by column, plus there's a bit of the code that doesn't work, specifically E=dfsubd,]. But if you want to get bootstrapped means for each column, a simple apply should work fine, like so:

> myMeanFun <- function(d, i) {
    d2 <- d[i]
    return(mean(d2, na.rm=T))
}

> myBootFun <- function(d) {
    boot(d, myMeanFun, R = 1000)
}

> lapply(df[,-1], function(x) myBootFun(x) )

$V183

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = d, statistic = myMeanFun, R = 1000)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.0186044 0.0004565272 0.008418108

$V184

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = d, statistic = myMeanFun, R = 1000)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.0186044 3.504457e-05 0.008293219

And you can use something like this to access particular statistics (here bootstrapped mean):

> sapply(df[,-1], function(x) myBootFun(x)$t0 )

      V183       V184       V185       V186       V187       V188       V189 
0.01860440 0.01860440 0.02114286 0.02114286 0.02621978 0.02621978 0.02621978 
      V190       V191       V192       V193       V194       V195       V196 
0.02621978 0.02664243 0.02886264 0.02886264 0.02291026 0.02362932 0.02559843 
      V197       V198       V199       V200       V201       V202       V203 
0.02009843 0.02650869 0.02467535 0.02631042 0.02631042 0.01861042 0.01861042 
      V204       V205 
0.01213124 0.01213124 

Also see the boot.ci function for confidence intervals, plus this guide might be useful to you:

http://www.ats.ucla.edu/stat/r/faq/boot.htm

jogall
  • 651
  • 6
  • 21
  • my mistake, I am looking for bootstrapped means for each column. Sorry, it was a typo, along with the other typo in the code. I will be much more careful next time. I have updated my question. – ctlamb Jul 08 '14 at 02:50
  • Thanks, Jogal, the provided script works well for generating bootstrapped means. When I use the second command you supplied (the sapply....) I receive this error message "Error in summary(myBootFun(x))$original : $ operator is invalid for atomic vectors" Also, is there a way I could only select certain group of rows, say rows 1:13? In my large dataset I have some other summary stats at the bottom of the sheet, and the formula is also including these in the bootstrapped means – ctlamb Jul 08 '14 at 03:32
  • My mistake, it should be something like: `sapply(df[,-1], function(x) myBootFun(x)$t0 )` -- I've edited the answer accordingly. Also, you can use bracket indexing in the first term of the apply function to select specific parts of the data, e.g. to select only rows 1:13 and columns 2:5: `sapply(df[,2:5][1:13,], function(x) myBootFun(x)$t0 )` – jogall Jul 08 '14 at 08:18
  • Fantastic, thanks Jogal. The row and column selection worked great. I am having this error come up when I run the sapply function to populate the bootstrapped values in my table "In mean.default(d2, na.rm = T) : argument is not numeric or logical: returning NA" – ctlamb Jul 09 '14 at 22:41
  • Sounds like you have non-numeric data in your matrix: apply functions input a matrix, and as a matrix must be of all one type, having any characters in your matrix will cause all numeric values to be converted to characters. You can either exclude the rows or columns containing characters using bracket indexing, or try using another apply-like function that inputs a data.frame, such as `colwise()` or `numcolwise()` in the `plyr` library. – jogall Jul 10 '14 at 14:48