I'm applying the spread function from the tidyr package to a grouped dataset. Below is an example of my actual data. I can summarize the data using group_by and the summarise function in dplyr, which yields a small dataframe with no NAs. However, if I reformat the dataframe using spread, cells that once had no NAs suddenly have NAs. I tried ungrouping first, but that did not help.
I must be missing something really simple.
In the code below, table.ratings will have no NAs, but new.table does.
# Sample data
year <- c(0,0,0,0,1,1,1,1,2,2,2,2,
0,0,0,0,1,1,1,1,2,2,2,2)
question <- c('a','b','c','d','a','b','c','d','a','b','c','d',
'a','b','c','d','a','b','c','d','a','b','c','d')
rating <- c(3.22, 4.33, 2.33, 2.33, NA, 3.00,
2.33, 3.00, 5.00, 3.67, 4.33, NA,
5.00, 4.10, 3.67, 2.82, 5.00, 5.00,
2.33, 3.00, 2.33, 5.00, 2.33, 3.00)
version <- c(1,2,1,2,1,2,1,2,1,2,1,2,
1,2,1,2,1,2,1,2,1,2,1,2)
df <- data.frame(year, version, question, rating)
# Grouped dataframe with no NAs
table.ratings <- df %>%
select(year, question, rating, version) %>%
group_by(year, version, question) %>%
dplyr::summarise(count = n(),
`Average Score` = mean(rating, na.rm = TRUE))
# This simply moves the cells around but results in NAs
new.table <- spread(table.ratings, key = "question",
value = `Average Score`)