This is an example of bootstraping using the R package rsample. There are many other possibilities like boot, bootstrap or manual coding but I like this approach since it's very flexible. First of all I load the packages
library(dplyr)
library(purrr)
library(rsample)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Set the global option `yardstick.event_first` to `FALSE` to change this.
That warning means that we need to set options(yardstick.event_first = FALSE)
since the first factor level for gold_std
, test1
and test2
is 0.
options(yardstick.event_first = FALSE)
Then i load the data
data <- tibble(
ID = 1:25,
gold_std = factor(c(0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)),
test_1 = factor(c(0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0)),
test_2 = factor(c(0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0))
)
The confusion matrix is:
table(select(data, gold_std, test_1))
#> test_1
#> gold_std 0 1
#> 0 8 7
#> 1 1 9
which means that the sensitivity is equal to 9 / (9 + 1) and specificity is equal to 8 / (7 + 8). Indeed
sens(data, truth = gold_std, estimate = test_1)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 sens binary 0.9
and
spec(data, truth = gold_std, estimate = test_1)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 spec binary 0.533
The functions sens
e spec
are defined in the R package yardstick
. I can use the R package rsample
to create 30 bootstrap replicates of the data:
set.seed(1)
bootstrap_data <- bootstraps(data, times = 30)
and this is the result
bootstrap_data
#> # Bootstrap sampling
#> # A tibble: 30 x 2
#> splits id
#> <list> <chr>
#> 1 <split [25/9]> Bootstrap01
#> 2 <split [25/13]> Bootstrap02
#> 3 <split [25/11]> Bootstrap03
#> 4 <split [25/7]> Bootstrap04
#> 5 <split [25/10]> Bootstrap05
#> 6 <split [25/10]> Bootstrap06
#> 7 <split [25/8]> Bootstrap07
#> 8 <split [25/7]> Bootstrap08
#> 9 <split [25/9]> Bootstrap09
#> 10 <split [25/7]> Bootstrap10
#> # ... with 20 more rows
bootstrap_data
is a new dataframe with 30 rows (as many as the number of boostrap replications) and 2 columns: splits
e id
. The column splits
records the number of observations excluded from each bootstrap replicate while id
is simply and id column. For example in the first replicate of the bootstrap samples 9 observations were excluded. Moreover we can use the function analysis
to retrive the observations sampled in each bootrstrap replicate. This is a more efficient way to store the indexes of a bootstrap sampling.
Now I can define two function which take as input a split object and return the estimate of sensitivity and specificity for that boostrap sample.
bootstrap_sens <- function(splits) {
x <- analysis(splits)
sens(x, truth = gold_std, estimate = test_1)$.estimate
}
bootstrap_spec <- function(splits) {
x <- analysis(splits)
spec(x, truth = gold_std, estimate = test_1)$.estimate
}
I use the notation $.estimate
since I need just the estimate of sensitivity and specificity (and you can see from the output at the beginning that they return a dataframe). I can apply these functions to the bootstrap replicates using map_dbl
(since the output of both functions is an object of type double) and this is the result:
bootstrap_data <- bootstrap_data %>%
mutate(
sens = map_dbl(splits, bootstrap_sens),
spec = map_dbl(splits, bootstrap_spec)
)
bootstrap_data
#> # Bootstrap sampling
#> # A tibble: 30 x 4
#> splits id sens spec
#> * <list> <chr> <dbl> <dbl>
#> 1 <split [25/9]> Bootstrap01 0.727 0.571
#> 2 <split [25/13]> Bootstrap02 1 0.471
#> 3 <split [25/11]> Bootstrap03 1 0.312
#> 4 <split [25/7]> Bootstrap04 0.9 0.533
#> 5 <split [25/10]> Bootstrap05 0.8 0.533
#> 6 <split [25/10]> Bootstrap06 1 0.364
#> 7 <split [25/8]> Bootstrap07 0.9 0.6
#> 8 <split [25/7]> Bootstrap08 1 0.533
#> 9 <split [25/9]> Bootstrap09 0.667 0.625
#> 10 <split [25/7]> Bootstrap10 0.667 0.75
#> # ... with 20 more rows
Created on 2019-07-22 by the reprex package (v0.3.0)
You can use the same approach for test_2
. I'm not sure what you mean by likelihood ratios
but you can define your own function to evalute it and apply it in the same way as bootstrap_sens
or bootstrap_spec
. You can find much more details about rsample at the package website.