-2

I am using R to manipulate a large dataset (dataset) that consists of 20,000+ rows. In my data, I have three important columns to focus on for this question: Trial_Nr (consisting of 90 trials), seconds (increasing in .02 second increments), and threat(fixation to threat: 1=yes, 0=no, NA). Within each trial, I need to answer when the initially fixates to threat (1), how long does it take for them to not fixate on threat (0). So basically, within each trial, I would need to find the first threat=1 and the subsequent threat=0 and subtract the time. I am able to get the first threat with this code:

initalfixthreat <- dataset %>%
                   group_by(Trial_Nr) %>%
                  slice(which(threat == '1')[1])

I am stumped on how to get the subsequent threat=0 within that trial number.

Here is an example of the data (sorry don't know how to format it better):

enter image description here

So for Trial_Nr=1, I would be interested in 689.9 seconds- 689.8. For Trial_Nr=2, I would want 690.04-689.96.

Please let me know if I was unclear and thank you all for your help!

  • 3
    run `dput(dataset)` and add that to your post instead of the picture. That way, people can just copy and paste the data into their R session – astrofunkswag Dec 21 '18 at 00:01

1 Answers1

2

One approach is:

library(dplyr)

df %>%
  group_by(Trial_Nr) %>%
  filter(!is.na(threat)) %>%
  mutate(flag = ifelse(threat == 1, 1, threat - lag(threat))) %>% 
  filter(abs(flag) == 1 & !duplicated(flag)) %>%
  summarise(timediff = ifelse(length(seconds) == 1, NA, diff(seconds)))

# A tibble: 2 x 2
  Trial_Nr timediff
     <int>  <dbl>
1        1 0.1   
2        2 0.0800

Data:

df <- structure(list(Trial_Nr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L), seconds = c(689.76, 689.78, 689.8, 689.82, 
689.84, 689.86, 689.88, 689.9, 689.92, 689.94, 689.96, 689.98, 
690, 690.02, 690.04), threat = c(0L, 0L, 1L, 1L, 1L, NA, NA, 
0L, 1L, 0L, 1L, NA, NA, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-15L))
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • So when I do this, I get this error: Error in summarise_impl(.data, dots) : Column `timediff` must be length 1 (a summary value), not 0. Do you know why this might be? – Mary Smirnova Dec 21 '18 at 00:40
  • Probably because there are trials where the threat is not unfixated upon/never fixated upon. See updated answer. – Ritchie Sacramento Dec 21 '18 at 00:55
  • Thanks Jay! Quick question if you have time- could you explain what lag does? – Mary Smirnova Dec 21 '18 at 17:31
  • 1
    `dplyr::lag`, by default, returns the value of the previous row - in this case the previous value of threat is subtracted from the current value in order to detect a change in state. – Ritchie Sacramento Dec 21 '18 at 21:31