I am working with a data set of changes over time and need to calculate the time at which the peak change occurs. I am running into a problem because some subjects have missing data (NA's).
Example:
library(dplyr)
Data <- structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L), .Label = c("1", "10", "11", "12", "13", "14", "16",
"17", "18", "19", "2", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "3", "31", "32", "4", "5", "7", "8", "9"), class = "factor"),
Close = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
), .Label = c("High Predictability", "Low Predictability"
), class = "factor"), SOA = structure(c(2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L), .Label = c("Long SOA", "Short SOA"), class = "factor"),
Time = c(-66.68, -66.68, -66.68, -66.68, -33.34, -33.34,
-33.34, -33.34, 0, 0, 0, 0, 33.34, 33.34, 33.34, 33.34, 66.68,
66.68, 66.68, 66.68, -66.68, -66.68, -66.68, -66.68, -33.34,
-33.34, -33.34, -33.34, 0, 0, 0, 0, 33.34, 33.34, 33.34,
33.34, 66.68, 66.68, 66.68, 66.68), Pcent_Chng = c(0.12314,
0.048254, -0.098007, 0.023216, 0.20327, 0.08338, -0.15157,
0.030008, 0.26442, 0.12019, -0.22878, 0.035547, 0.31849,
0.15488, -0.26887, 0.038992, 0.39489, 0.15112, -0.31185,
0.02144, NA, 0.046474, NA, 0.17541, NA, 0.14975, NA, 0.3555,
NA, -0.1736, NA, 0.72211, NA, -0.32201, NA, 1.0926, NA, -0.39551,
0.72211, 1.4406)), class = "data.frame", row.names = c(NA, -40L
), .Names = c("Subject", "Close", "SOA", "Time", "Pcent_Chng"
))
I get an error with the following attempt:
Data %>%
group_by(Subject,Close,SOA) %>%
summarize(Peak_Pcent = max(Pcent_Chng),
Peak_Latency = Time[which.max(Pcent_Chng)])
The error is:
Error in summarise_impl(.data, dots) :
Column `Peak_Latency` must be length 1 (a summary value), not 0
This seems to be due to the NA's, which are only in some SOA
conditions. Using complete.cases()
with my actual data is too aggressive and removes too much data.
Is there a workaround to ignore the NA's?