I have two data frames, trips
which are unique trips taken by bikes that have a unique id
, and intervals
, which show the locations of bikes ids at every 10 minutes. My objective is to remove records from intervals
if time
falls in between start
and finish
and bike_id
s are the same. Time is of posixCT
class and the original dataframes have hundreds of thousands of records
For example, the outcome of these two datasets below should be:
> trips
bike_id start finish
1 1 2017-11-22 15:52:36 2017-11-22 17:47:53
2 2 2017-11-22 16:05:44 2017-11-22 16:23:25
3 3 2017-11-22 16:31:06 2017-11-22 17:11:20
> intervals
time bike_id
3 2017-11-22 16:00:03 1
4 2017-11-22 16:10:03 1
5 2017-11-22 16:20:02 1
6 2017-11-22 16:30:02 1
7 2017-11-22 16:40:03 1
8 2017-11-22 16:50:02 1
9 2017-11-22 17:00:02 1
10 2017-11-22 17:10:02 1
11 2017-11-22 17:20:03 1
12 2017-11-22 17:30:03 1
13 2017-11-22 16:00:03 2
14 2017-11-22 16:10:03 2
15 2017-11-22 16:20:02 2
16 2017-11-22 16:30:02 2
17 2017-11-22 16:40:03 2
18 2017-11-22 16:50:02 2
19 2017-11-22 17:00:02 2
20 2017-11-22 17:10:02 2
21 2017-11-22 17:20:03 2
22 2017-11-22 17:30:03 2
23 2017-11-22 16:30:02 3
24 2017-11-22 16:40:03 3
25 2017-11-22 16:50:02 3
26 2017-11-22 17:00:02 3
27 2017-11-22 17:10:02 3
28 2017-11-22 17:20:03 3
29 2017-11-22 17:30:03 3
And the outcome
> outcome
time bike_id
13 2017-11-22 16:00:03 2
16 2017-11-22 16:30:02 2
17 2017-11-22 16:40:03 2
18 2017-11-22 16:50:02 2
19 2017-11-22 17:00:02 2
20 2017-11-22 17:10:02 2
21 2017-11-22 17:20:03 2
22 2017-11-22 17:30:03 2
23 2017-11-22 16:30:02 3
28 2017-11-22 17:20:03 3
29 2017-11-22 17:30:03 3
Not exactly sure where to start. Any suggestions on where to start with dplyr
or an apply
function would be appreciated!
Here is the sample data:
> dput(intervals)
structure(list(time = structure(c(1511384403.94561, 1511385003.17654,
1511385602.47887, 1511386202.99895, 1511386803.18361, 1511387402.98233,
1511388002.69461, 1511388602.5818, 1511389203.52712, 1511389803.652,
1511384403.94561, 1511385003.17654, 1511385602.47887, 1511386202.99895,
1511386803.18361, 1511387402.98233, 1511388002.69461, 1511388602.5818,
1511389203.52712, 1511389803.652, 1511386202.99895, 1511386803.18361,
1511387402.98233, 1511388002.69461, 1511388602.5818, 1511389203.52712,
1511389803.652), class = c("POSIXct", "POSIXt"), tzone = ""),
bike_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)), .Names = c("time",
"bike_id"), row.names = 3:29, class = "data.frame")
> dput(trips)
structure(list(bike_id = c(1, 2, 3), start = structure(c(1511383956,
1511384744, 1511386266), class = c("POSIXct", "POSIXt"), tzone = ""),
finish = structure(c(1511390873, 1511385805, 1511388680), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("bike_id", "start", "finish"
), row.names = c(NA, 3L), class = "data.frame")