3

In my dataset, I have 5 readers who have classified tests (as either 0, 1 or 2) repeatedly over multiple days.

On each day, only 2-3 readers out of the 5 actually classified tests.

library(tidyverse)
library(broom)

df <- tibble(day = rep(1:10,10)) %>%
  arrange(day) %>%
  mutate(reader1 = rep(c(1, 2, 0, 0, 2, NA, NA, NA, NA, NA), each = 2, 5),
         reader2 = rep(c(NA, NA, NA, NA, NA, 1, 1 , 0, 1, 2), each = 2, 5),
         reader3 = rep(c(1, 1, 1, 0, 2, NA, NA, NA, NA, NA), each = 2, 5),
         reader4 = rep(c(NA, NA, NA, NA, NA, 2, 1, 0, 1, 2), each = 2, 5),
         reader5 = rep(c(NA, NA, NA, NA, NA, 2, 2, 0, 1, 2), each = 2, 5))

Ultimate aim is to estimate intraclass correlations (using the ICC command in the psych package) between readers at each day. Ideal output would be a single data frame with the ICCs (and 95% confidence intervals) for each day to allow plotting.

This answer has been helpful, but only applies to situations with exactly two readers.

Where I am stuck:

Firstly, for each day, dropping columns where readers did not classify test (I think this is needed as ICC cannot have readers with no observations).

df %>%
  group_by(day) %>%
  nest()
  #something here to drop non-readers
  select_if(colSums(!is.na(.)) > 0)
  #doesn't work. Need to slice into separate data frames?

Secondly, how to extract ICCs and 95% CIs into a single tidy data frame?

df %>%
  group_by(day) %>%
  nest() %>%
  #something here to split data by day
  do(ICC(.)) %>%
  tidy()
  #not working
Peter MacPherson
  • 683
  • 1
  • 7
  • 17

1 Answers1

2

I have no idea about ICC and the expected output, but you can try this way? First split the data by day, then remove missing tests e.g. dropping columns where readers did not classify test and calculate the ICCs.

res <- lapply(split(df, df$day), function(x){
  tmp <- x %>% gather(key, value, -day) %>% 
    group_by(key) %>% 
    mutate(test=1:n()) %>% 
    filter(!is.na(value)) %>% 
    spread(key,value) %>% 
    select(starts_with("reader"))
    ICC(as.matrix(tmp))$results
  })

The final data can be analysed with the tidyverse again.

res %>% 
  bind_rows(.id = "day") %>% 
  ggplot(aes(type, ICC)) +
     geom_col() +
     facet_wrap(~day)

enter image description here

Of course you can do all in one pipeline using map() from the purrr package.

library(tidyverse)
library(psych)
df %>% 
  split(.$day) %>% 
   map(~gather(.,key, value, -day)) %>% 
   map(~group_by(.,key)) %>% 
   map(~mutate(.,test=1:n())) %>%   
   map(~filter(.,!is.na(value))) %>% 
   map(~spread(.,key,value)) %>% 
   map(~select(.,starts_with("reader"))) %>% 
   map(~ICC(as.matrix(.))$results) %>% 
   bind_rows(.id = "day")  
Roman
  • 17,008
  • 3
  • 36
  • 49
  • Thanks - looks good, but doesn't seem to work for me, either on this MRE, or on the real data. Error: `Error in bind_rows_(x, .id) : Argument 1 must be a data frame or a named atomic vector, not a psych/ICC`. Perhaps my version of dplyr (0.7.2)? – Peter MacPherson Oct 10 '17 at 14:09
  • sorry a small mistake, see my edits. Must be `ICC(as.matrix(tmp))$results` – Roman Oct 10 '17 at 14:11