Merge data frames by time interval in R

Question

I have two Data Frames. One is an Eye Tracking data frame with subject, condition, timestamp, xposition, and yposition. It has over 400,000 rows. Here's a toy data set for an example:

   subid condition time xpos ypos
1      1         1 1.40  195  140
2      1         1 2.50  138  147
3      1         1 3.40  140  162
4      1         1 4.10  188  150
5      1         2 1.10  131  194
6      1         2 2.10  149  111

eyedata <- data.frame(subid = rep(1:2, each = 8),
           condition = rep(rep(1:2, each = 4),2),
           time = c(1.4, 2.5, 3.4, 4.1, 
                    1.1, 2.1, 3.23, 4.44, 
                    1.33, 2.3, 3.11, 4.1,
                    .49, 1.99, 3.01, 4.2),
           xpos = round(runif(n = 16, min = 100, max = 200)),
           ypos = round(runif(n = 16, min = 100, max = 200)))

Then I have a Data Frame with subject, condition, a trial number, and a trial begin and end time. It looks like this:

   subid condition trial begin end
1      1         1     1  1.40 2.4
2      1         1     2  2.50 3.2
3      1         1     2  3.21 4.5
4      1         2     1  1.10 1.6
5      1         2     2  2.10 3.3
6      1         2     2  3.40 4.1
7      2         1     1  0.50 1.1
8      2         1     1  1.44 2.9
9      2         1     2  2.97 3.3
10     2         2     1  0.35 1.9
11     2         2     1  2.12 4.5
12     2         2     2  3.20 6.3

trials <- data.frame(subid = rep(1:2, each = 6),
                     condition = rep(rep(1:2, each = 3),2),
                     trial= c(rep(c(1,rep(2,2)),2),rep(c(rep(1,2),2),2)),
                     begin = c(1.4, 2.5, 3.21, 
                               1.10, 2.10, 3.4, .50,
                               1.44,2.97,.35,2.12,3.20),
                     end = c(2.4,3.2,4.5,1.6,
                             3.3,4.1,1.1,2.9,
                             3.3,1.9,4.5,6.3))

The number of trials in a condition are variable, and I want to add a column to my eyetracking dataframe that specifies the correct trial based upon whether the timestamp falls within the time interval. The time intervals do not overlap, but there will be many rows for the eyetracking data in between trials. In the end I'd like a dataframe like this:

subid condition trial time xpos ypos
    1      1        1 1.40  198  106
    1      1        2 2.50  166  139
    1      1        2 3.40  162  120
    1      1        2 4.10  113  164
    1      2        1 1.10  162  120
    1      2        2 2.10  162  120

I've seen data.table rolling joins, but would prefer a solution with dplyr or fuzzyjoin. Thanks in advance.

Is this just an example of the structure of the result or is the full expected result from the given data? — Andrew Lavers, Aug 11 '17 at 01:53
As @epi99 pointed out your output and explanation on what your output Data Frame does not match. If you can be precise on what you need, what have you tried already, we can help. — i.n.n.m, Aug 11 '17 at 03:33
Sorry, this is just an example of what the result could look like. I'm not sure how the desired result does not line up with the eyedata example, except for the fact that there will be many, probably hundreds of rows at the beginning with trial labeled `none`, and then many rows labelled 1 and 2 for each condition. — Spencer Castro, Aug 11 '17 at 07:08
Ok, I think I've edited any discrepancies. What I'd like is to check whether `time` in Eye Tracking is in between `begin` and `end` for a given `subid` and `condition`, and if it is, then what is the number for `trial` that is associated with that time window? Add this number from `trial` to its own column in the Eye Tracking Data Frame. — Spencer Castro, Aug 11 '17 at 07:31

score 4 · Accepted Answer · answered Aug 11 '17 at 09:48

Here's what I tried, but I can't figure the discrepancies, so it is likely an incomplete answer. Row 12,13 of this result may be an overlap in time. Also, when using random generation functions such as runif please set.seed -- here xpos and ypos have no bearing on the result, so not an issue.

eyedata  %>%
  left_join(trials, by = c("subid", "condition")) %>%
  filter( (time >= begin & time <= end)) 

#    subid condition time xpos ypos trial begin end
# 1      1         1 1.40  143  101     1  1.40 2.4
# 2      1         1 2.50  152  173     2  2.50 3.2
# 3      1         1 3.40  185  172     2  3.21 4.5
# 4      1         1 4.10  106  119     2  3.21 4.5
# 5      1         2 1.10  155  165     1  1.10 1.6
# 6      1         2 2.10  169  154     2  2.10 3.3
# 7      1         2 3.23  166  134     2  2.10 3.3
# 8      2         1 2.30  197  171     1  1.44 2.9
# 9      2         1 3.11  140  135     2  2.97 3.3
# 10     2         2 0.49  176  139     1  0.35 1.9
# 11     2         2 3.01  187  180     1  2.12 4.5
# 12     2         2 4.20  147  176     1  2.12 4.5
# 13     2         2 4.20  147  176     2  3.20 6.3

Thanks, so far I think this works despite my poor example. Next time I'll remember to set.seed for meaningful columns. — Spencer Castro, Aug 11 '17 at 10:28
@SpencerCastro if this answer guided you to solve your problem, you should at least consider to up vote to reward for effort and time put in. — i.n.n.m, Aug 11 '17 at 14:22
I thought my reputation means that my votes can't be counted yet. — Spencer Castro, Aug 11 '17 at 19:02

Merge data frames by time interval in R

1 Answers1