1

I am sure this is realted to Bootstrapping Krippendorff's Alpha. But I didn't understand the question nor the answers there. And it looks like that even the answers and comments are contradicting each other.

set.seed(0)
df <- data.frame(a = rep(sample(1:4),10), b = rep(sample(1:4),10))
kripp.alpha(t(df))

This is the output.

 Krippendorff's alpha

 Subjects = 40 
   Raters = 2 
    alpha = 0.342 

How can I compute the confidence interval here?

Community
  • 1
  • 1
buhtz
  • 10,774
  • 18
  • 76
  • 149

1 Answers1

3

You are right it is connected to bootstrapping. You could compute the confidence interval the following way:

 library(irr)
 library(boot)

 alpha.boot <- function(d,w) {
        data <- t(d[w,])
        kripp.alpha(data)$value
 }

 b <- boot(data = df, statistic = alpha.boot, R = 1000)
 b
 plot(b)
 boot.ci(b, type = "perc")

This is the output:

 Bootstrap Statistics :
      original      bias    std. error
 t1* 0.3416667 -0.01376158   0.1058123

 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
 Based on 1000 bootstrap replicates

 CALL : 
 boot.ci(boot.out = b, type = "perc")

 Intervals : 
 Level     Percentile     
 95%   ( 0.1116,  0.5240 )  
 Calculations and Intervals on Original Scale

there is also a R script from Zapf et al. 2016 look for Additional file 3 at the bottom of the page just before the references

Or you could use the kripp.boot function available on github MikeGruz/kripp.boot

Kev
  • 425
  • 3
  • 8
  • Why there are 1000 replicates? Why not more or less? – buhtz Feb 04 '17 at 19:55
  • 1
    [Zapf et al 2016](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0200-9) used 1000 because of [Efron](https://statistics.stanford.edu/sites/default/files/BIO%20139.pdf). – Kev Feb 04 '17 at 22:12
  • When I understand the paper right, your answer can be used for Fleiss Kappa, too? – buhtz Feb 07 '17 at 00:22
  • yes, for nominal data it is true: In the case of nominal data and no missing values, Fleiss’ K and Krippendorff’s alpha can be recommended equally for the assessment of inter-rater reliability. However, If the measurement scale is not nominal and/or missing values (completely at random) are present, only Krippendorff’s alpha is appropriate(https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-016-0200-9). – Kev Feb 07 '17 at 06:11