-1

I have a data frame df with a X column with normally distributed values along 1,000,000 rows. The max value in X = 0.8. Using R (and perhaps the "boot" package), I would like to do bootstrapping with replacement to estimate how unlikely is to get max(df$X)=0.8 from my data. For this, I could take n bootstrap samples from X and calculate the max value of each sample. Then I can take the standard deviation of each max(sample) and see how far is 0.8 from this st dev. Does anyone know how to do this bootstrapping with R?. Any suggestion is welcomed !

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Lucas
  • 1,139
  • 3
  • 11
  • 23
  • https://www.statmethods.net/advstats/bootstrapping.html or countless other tutorials would be a good starting point. – thelatemail Jun 28 '18 at 02:25
  • I'm not sure I understand your question. If `X` is normally distributed, then the probability that `max(X) = 0.8` is zero. – Maurits Evers Jun 28 '18 at 03:38

1 Answers1

1

Bootstrapping from x, where x is a normal random variable. statistic function needs to be provided which requires at least data and indices as its arguments. check the R documentation of boot package for more details.

max_x function below checks if the max(x) is same as maximum of a bootsrapped sample. Note that the test data (x) considered in below code has a different maximum value, but conceptual framework remains the same:

set.seed(101)
x <- rnorm(1000, mean= 0.4, sd= 0.2)               # normally distributed test data

max_x <- function(data, indices){ m <- max(data[indices])
                                  if (m == max(x)) { return(1)   
                                                   }  else{ return(0)}
                                }

results <- boot(data = x, statistic = max_x, R = 1000)          # 1000 replications

mean(results$t == 1)                           # probability of max getting sampled
# 0.618

results
# ORDINARY NONPARAMETRIC BOOTSTRAP

# Call:
# boot(data = x, statistic = max_x, R = 1000)
# Bootstrap Statistics :
#     original  bias    std. error
# t1*        1  -0.382   0.4861196

plot(results)

statistics

Mankind_008
  • 2,158
  • 2
  • 9
  • 15