0

I have a data frame with 27 samples divided in 3 strata. I want to replicate 500 times a weighted mean, where the mean is calculate among the random selection of 3 samples for stratum and the weight is the relative area of the strata.

My idea was to create a loop of selection for each stratum and to compute the mean. I am able to compute the simple mean of the selection but I am not able to compute the weighted mean (I have no idea how to extract the weight and the value together):

#data
DF<-data.frame(v= c(16,42,63,15,42,63,85,16,43),
              s= c(1,3,2,2,1,3,3,1,2),
                  w=c(0.2,0.5,0.3,0.3,0.2,0.5,0.5,0.2,0.3),
                  stringsAsFactors=T)
#simple mean
x<-c()
for (i in 1:3){
  x.tm<-sample(subset(DF$v,DF$s==i),2,replace=T)
  x<-c(x,x.tm)
  d<-mean(x)}

furthermore, I'm confused about the replicate function and the way to insert the weighted mean inside it. For example, trying with the simple mean I have obtained an empty list:

t<-replicate(500,{
  for (i in 1:3){
  x.tm<-sample(subset(DF$v,DF$s==i),2, replace=T)
  x<-c(x,x.tm)
  d<-mean(x)
  }
  })

I have also tried using the boot::boot command but the result was the same.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Matt_4
  • 147
  • 1
  • 12
  • So I guess `x` stands for the values, `w` for the weights and `s` for the stratum ID ? Before we talk about bootstrapping, can you tell us what do you precisely want ? Here's what I understand from your code. For stratum number 1, you draw 2 values randomly (with the associated weights hopefully), then you want to calculate the (weighted) mean for each stratum or to calculate the (weighted) mean for all 6 values (3 strata x 2 drawn values) ? – kluu Feb 28 '18 at 15:55
  • Sorry, "v" stands for the values of my population while "x" stands for the extracted values. I want to compute the weighted mean for all 6 values. My final goal is to obtain a database with 500 averages and then to calculate the 2.5 and the 97.5 percentile and the mean about these. – Matt_4 Feb 28 '18 at 16:18
  • A similar approach is presented here: https://www.researchgate.net/profile/Jianwei_Zhang9/publication/301599779_Sample_Sizes_to_Control_Error_Estimates_in_Determining_Soil_Bulk_Density_in_California_Forest_Soils/links/57fbbb9508ae329c3d4979de/Sample-Sizes-to-Control-Error-Estimates-in-Determining-Soil-Bulk-Density-in-California-Forest-Soils.pdf – Matt_4 Feb 28 '18 at 16:18
  • Yeah my bad, I meant `v` but nevermind, the answer proposed by Terru_theTerror should do the job. – kluu Feb 28 '18 at 16:45

1 Answers1

1

This is a possible way.

A function selecting 3 samples for s=1,2,3 and provides the weighted.mean between v and w

fun<-function(DF) { 

  s<-c(1,2,3)
  DF_sub_1<-DF[as.numeric(as.character(DF$s))==s[1],]
  DF_sub_2<-DF[as.numeric(as.character(DF$s))==s[2],]
  DF_sub_3<-DF[as.numeric(as.character(DF$s))==s[3],]

  x.tm_1<-sample(nrow(DF_sub_1),2,replace=T)
  x.tm_2<-sample(nrow(DF_sub_2),2,replace=T)
  x.tm_3<-sample(nrow(DF_sub_3),2,replace=T)

  DF_sample<-rbind(DF_sub_1[x.tm_1,],DF_sub_2[x.tm_2,],DF_sub_3[x.tm_3,])

  out<-weighted.mean(DF_sample[,1],DF_sample[,3])

  return(out)  
}

500 time replication

output<-replicate(500,fun(DF))

500 samples with weighted mean of the 3 samples

output
  [1] 46.00 41.15 58.70 51.50 61.70 49.00 58.70 61.70 50.60 49.00 44.70 46.25 46.40 52.80 67.20 32.90 36.55
 [18] 47.95 42.05 45.35 40.75 57.10 40.75 44.70 51.85 48.90 40.10 43.75 54.40 53.20 47.95 51.50 51.90 47.30
 [35] 58.30 54.50...
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Thak you for your help! however I don't need 3 different averages but the weighted mean of all the 6 values. – Matt_4 Feb 28 '18 at 16:23