2

I have a data frame with data from two raters to test the reliability of 4 different tests:

test1_rater1<-c(1,4,3,2,3,4,1,2,2,3)
test2_rater1<-c(1,3,3,3,2,3,1,1,2,1)
test3_rater1<-c(1,4,3,4,4,2,3,1,3,4)
test4_rater1<-c(1,3,4,2,3,2,1,2,3,2)
test1_rater2<-c(1,3,3,4,3,4,3,2,1,3)
test2_rater2<-c(1,3,1,3,1,3,3,1,1,1)
test3_rater2<-c(1,3,3,2,4,2,3,4,3,4)
test4_rater2<-c(2,3,4,4,3,2,3,2,3,2)
mydata<-data.frame(test1_rater1,test2_rater1,test3_rater1,test4_rater1,test1_rater2,test2_rater2,test3_rater2,test4_rater2)

# For the kappa statistic, I used:

cohen.kappa(cbind(test1_rater1,test1_rater2))
cohen.kappa(cbind(test2_rater1,test2_rater2))
cohen.kappa(cbind(test3_rater1,test3_rater2))
cohen.kappa(cbind(test4_rater1,test4_rater2))

As in my data frame is data from over 80 different tests, this solution is quiet complicated... I thougt about a list and then using the lapply function, but i did not work. Is there e shorter way to do this?

Thanks, Nat

  • 1
    What language is this? – Shotgun Ninja Aug 31 '15 at 14:27
  • Why did the lapply not work? Please show your code and the expected output. – Heroka Aug 31 '15 at 14:44
  • maybe a starting point [here](http://stackoverflow.com/questions/32203753/export-fixed-range-of-columns-from-dataframe-to-pdf-one-slice-per-sheet) – Tensibai Aug 31 '15 at 14:50
  • You could `split` the column names and then use `lapply` i.e `lapply(split(names(mydata), sub('_.*', '', names(mydata))), function(x) cohen.kappa(mydata[x]) )` – akrun Aug 31 '15 at 15:09

1 Answers1

4

The first step is to tidy your data: instead of having one column for each pair of test and rater, have a column for test, then columns for rater1 and rater2. You can do this restructuring with the dplyr and tidyr packages:

library(dplyr)
library(tidyr)
rearranged_data <- mydata %>%
  mutate(row = row_number()) %>%
  gather(column, value, -row) %>%
  separate(column, c("test", "rater")) %>%
  spread(rater, value)

head(rearranged_data)
#>   row  test rater1 rater2
#> 1   1 test1      1      1
#> 2   1 test2      1      1
#> 3   1 test3      1      1
#> 4   1 test4      1      2
#> 5   2 test1      4      3
#> 6   2 test2      3      3

Now you can perform the Cohen-kappa calculation within each test. This will require a function to turn a kappa object into a data frame. You could use this function:

library(broom)
tidy_kappa <- function(x) {
  broom::fix_data_frame(x$confid, newcol = "type")
}

If you install the newest version of my broom package with devtools::install_github("dgrtwo/broom"), you could just use the tidy method, as I've just added one like this to the package.

Now you can perform your cohen.kappa tests with dplyr's group_by and do:

library(psych)
results <- rearranged_data %>%
  group_by(test) %>%
  do(tidy_kappa(cohen.kappa(cbind(.$rater1, .$rater2))))
results
#> Source: local data frame [8 x 5]
#> Groups: test
#> 
#>    test             type       lower  estimate     upper
#> 1 test1 unweighted kappa  0.08574000 0.4594595 0.8331789
#> 2 test1   weighted kappa  0.07284356 0.5238095 0.9747755
#> 3 test2 unweighted kappa -0.10654813 0.3333333 0.7732148
#> 4 test2   weighted kappa -0.09877879 0.4444444 0.9876677
#> 5 test3 unweighted kappa  0.19876127 0.5833333 0.9679054
#> 6 test3   weighted kappa -0.39241493 0.3577982 1.1080113
#> 7 test4 unweighted kappa  0.21116862 0.5714286 0.9316885
#> 8 test4   weighted kappa -0.02324226 0.4444444 0.9121311

This approach will work no matter how many tests you have, resulting in two rows for each (one with unweighted kappa, one with weighted: just like the output of the cohen.kappa function). Note that this output format is useful for graphing or further analyzing the results:

library(ggplot2)
ggplot(results, aes(estimate, test)) +
  geom_point() +
  geom_errorbarh(aes(xmin = lower, xmax = upper)) +
  facet_wrap(~ type) +
  geom_vline(xintercept = 0, color = "red", linetype = 2)

enter image description here

David Robinson
  • 77,383
  • 16
  • 167
  • 187