-1

I'm making a scatter plot of two statistics X(e) and Y(e) for various values of scalar parameter e. The sampling distribution of both X and Y is not normally distributed.

Now I want to calculate a 2-D confidence interval for each point (X(e),Y(e)) in the scatter plot. How do I do that?

Since the sampling distribution is not normally distributed I'm using the boot package in R. With this I can calculate a confidence interval for each X(e) and Y(e) independently. Is this approach statistically sound or should I sample from a 2-D sampling distribution? In that case how do I do that?

Luc
  • 445
  • 3
  • 17
  • You *"can calculate a confidence interval for each X(e) and Y(e) independently."* . Are the variables independent? – Rui Barradas Sep 15 '22 at 13:36
  • If X(e) and Y(e) are both functions of the random variable e, then shouldn't you just work out the confidence interval for e and calculate X and Y of these limits? – Allan Cameron Sep 15 '22 at 13:43
  • e is not a random variable. It is a scalar. X(e) and Y(e) are random variables though. – Luc Sep 15 '22 at 14:49
  • @RuiBarradas how can I check that both random variables are independent? – Luc Sep 15 '22 at 14:50
  • You can use the bootstrap to incorporate correlation as well as distribution information into simultaneous confidence intervals; see https://www.wiley.com/en-us/Resampling+Based+Multiple+Testing:+Examples+and+Methods+for+p+Value+Adjustment-p-9780471557616 – BigBendRegion Sep 20 '22 at 17:00

1 Answers1

1

Here is a way with base package boot to bootstrap confidence intervals for a statistic, the mean, of two vectors simultaneously.

df1 <- iris[1:50, 1:2]
head(df1)
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6
#> 6          5.4         3.9

library(boot)

bootfun <- function(x, i) colMeans(x[i,])

R <- 1000L
set.seed(2022)

b <- boot(df1, bootfun, R)
colMeans(b$t)
#> [1] 5.010754 3.431042
boot.ci(b)
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 1000 bootstrap replicates
#> 
#> CALL : 
#> boot.ci(boot.out = b)
#> 
#> Intervals : 
#> Level      Normal              Basic             Studentized     
#> 95%   ( 4.905,  5.097 )   ( 4.902,  5.092 )   ( 4.904,  5.093 )  
#> 
#> Level     Percentile            BCa          
#> 95%   ( 4.920,  5.110 )   ( 4.914,  5.096 )  
#> Calculations and Intervals on Original Scale

Created on 2022-09-15 with reprex v2.0.2


But to compute two different bootstrapped statistic, it is more complicated, boot only bootstrap one statistic at a time.
Create a list with the statistics of interest and a function bootstrapping the stats in the list one by one. The return value of this function is a list of objects of class "boot" and a lapply loop can compute the confidence intervals.

library(boot)

bootfun2 <- function(data, stat_list, R, ...) {
  stat <- function(x, i, f) {
    y <- x[i]
    f(y)
  }
  lapply(stat_list, \(f) {
    boot(data, stat, R = R, f = f)
  })
}

R <- 1000L
set.seed(2022)

e <- df1[[1]]

flist <- list(X = mean, Y = sd)
blist <- bootfun2(e, flist, R)
ci_list <- lapply(blist, boot.ci)
#> Warning in FUN(X[[i]], ...): bootstrap variances needed for studentized
#> intervals

#> Warning in FUN(X[[i]], ...): bootstrap variances needed for studentized
#> intervals

ci_list[[1]]$percent[4:5]
#> [1] 4.920000 5.109949
ci_list[[2]]$percent[4:5]
#> [1] 0.2801168 0.4103270

ci_list[[1]]$bca[4:5]
#> [1] 4.91400 5.09639
ci_list[[2]]$bca[4:5]
#> [1] 0.2956919 0.4308415

Created on 2022-09-15 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66