0

I have the following matrix (let's call it df), for which I would like to create bootstrapped means and 95% confidence intervals for each column, due to the heavily 0 weighted distribution. I would like the mean and CI's to be added to the bottom of the matrix as new rows. This is a small subset of the data, the true data has >600 rows which will make the bootstrapping much more effective.

row.names   V183    V184    V185    V186    V187    V188    V189    V190    V191    V192    V193    V194    V195    V196    V197    V198    V199    V200    V201    V202    V203    V204    V205
1   0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  NA  NA
2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0.022   0   NA  NA  NA  NA  NA  NA
4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308  0.07692308
5   0   0   0   0   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0.066   0   0   0   0   0   0   0   0   0
6   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0.077   0   0   0   0
7   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
8   0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0.07142857  0   0   0   0   0   0   0   0   0   0   0   0
9   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA
10  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806  0.03225806
12  0   0   0   0   0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
13  0   0   0   0   0   0   0   0   0   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
14  0   0   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0.033   0   0   0   0   0   0

I have tried this:

 boot.mean <- function(df,i){boot.mean <- mean(df[i])}
 df["BootMean" ,] <- boot(df, boot.mean, R = 2000)

But it says "undefined columns are selected

So I tried this:

 boot.mean <- function(df[1:23],i){boot.mean <- mean(df[i])}
 df["BootMean" ,] <- boot(df, boot.mean, R = 2000)

But it says there is a "[" that it doesn't like.

I recently tried this:

 n<-length(df)
 B<-1000
 boot.mean <- function(df,i){boot.mean <- mean(df[,i],na.rm = TRUE)}
 df["BootMean" ,] <-for (i in 1:n) {
 boot(df[1:14,i],boot.mean,R=B)
 }

But I receive a "error in evaluating the argument 'x' in selecting a method for function 'mean': Error in df[, i] : incorrect number of dimensions"

Do I need to use an apply function or something??? Please Help, the brain is hurting over this trivial problem!

*****I've made some progress, but am not all the way yet.

I've been able to get a booted mean for a single row by subsetting it out, but I am unable to incorporate a na.rm=T function into the formula, so I also have to manually remove those. Can anyone suggest a way to add the na.rm fn?

df<-subset(dfboot,F_BS_sub[1:323, 1]>=0)
dfa<-df[,1]
dfb<-subset(dfa,V183>=0)
boot.mean <- function(dfb, d) {
  E=dfb[d,]
  return(mean(E))}
b = boot(dfb, boot.mean, R=1000)
b
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
ctlamb
  • 131
  • 1
  • 4
  • 14
  • 2
    You are not selecting columns form your matrix correctly. It should be `mean(df[,i])`. Plus since you have NA values, you might want `mean(df[,i], na.rm=T)` – MrFlick Jul 06 '14 at 18:53
  • I don't think bootstrapping necessarily corrects for the limitations in the size of the data. What does it mean to ask for the 95%-ile for a sample size of 12???? – IRTFM Jul 06 '14 at 19:02
  • Hi BD, sorry, this is a sample dataset from a dataset with >600 rows. I mentioned that in the first paragraph, but I should have been more clear. I'm trying to apply MrFlick's suggestions, as he is totally correct but I am still unable to make it work. – ctlamb Jul 06 '14 at 19:06
  • I have now tried multiple approaches including for loops, and MrFlicks suggestions but I still get "undefined columns selected" Any suggestions on this?? – ctlamb Jul 06 '14 at 19:41
  • I've made a bit of progress as shown above, but am unsure how to include a na.rm function in the new formula. I will then try to make a for loop for the new formula and apply it to the matrix. Look for the ***** in the original post for the code on my latest breakthrough – ctlamb Jul 06 '14 at 22:15
  • Did you try `return(mean(E, na.rm=T))`? It's not "nice" to really change a question. If you are having a new problem, it's best to post a new question. If the first one was just a silly typo, it's best to delete it as it is unlikely to help anyone in the future. It really only makes sense to edit to add missing information or other details that people as for. – MrFlick Jul 06 '14 at 22:35
  • ah, that worked MrFlick, thanks. I'll repost the question and fix this one up. Thanks for all your help – ctlamb Jul 07 '14 at 12:56

0 Answers0