Data
I have the output of a driving simulator experiment. I am sharing some data for 2 different drivers changing lanes. Following is the first data set foo
:
Data Set # 1
> foo
# A tibble: 4 x 7
file.ID lcf TL lead_veh_TL foll_veh_TL Start_Frame_CS End_Frame1_CS
<chr> <int> <chr> <chr> <chr> <dbl> <dbl>
1 Cars_20160601_01.hdf5 43207 right_lane StarT7 <NA> 42899 43476
2 Cars_20160601_01.hdf5 43207 right_lane <NA> ditiExpeon6 42899 43476
3 Cars_20160601_02.hdf5 52843 left_lane BMWC10 <NA> 52498 53211
4 Cars_20160601_02.hdf5 52843 left_lane <NA> owT8Yell 52498 53211
where,
* file.ID
= Unique ID of a driving scenario
* lcf
= Time frame # when the vehicle touched lane marking
* TL
= Target Lane (where the vehicle goes at the end of lane change)
* lead_veh_TL
= Name of lead vehicle in target lane
* foll_veh_TL
= Name of following vehicle in target lane
* Start_Frame_CS
= Time frame # when the lane change started in origin lane
* End_Frame1_CS
= Time frame # when the lane change ended in the target lane
Here's an illustration for file.ID=="Cars_20160601_01.hdf5"
scenario:
Data Set # 2
The second data frame consists of speed of all vehicles at all times (including the times when lane change occurred). Following are few rows:
> bar
# A tibble: 205,231 x 5
file.ID frames lane ADO_name speed.kph
<chr> <int> <chr> <chr> <dbl>
1 Cars_20160601_01.hdf5 35002 left_lane BMWC10 80.62273
2 Cars_20160601_01.hdf5 35003 left_lane BMWC10 80.72590
3 Cars_20160601_01.hdf5 35004 left_lane BMWC10 80.83455
4 Cars_20160601_01.hdf5 35005 left_lane BMWC10 80.94342
5 Cars_20160601_01.hdf5 35006 left_lane BMWC10 81.05671
6 Cars_20160601_01.hdf5 35007 left_lane BMWC10 81.17065
7 Cars_20160601_01.hdf5 35008 left_lane BMWC10 81.28705
8 Cars_20160601_01.hdf5 35009 left_lane BMWC10 81.40385
9 Cars_20160601_01.hdf5 35010 left_lane BMWC10 81.52023
10 Cars_20160601_01.hdf5 35011 left_lane BMWC10 81.63548
# ... with 205,221 more rows
where,
* frames
= Time frame #
* lane
= current lane
* ADO_name
= name of the vehicle (it includes both lead and following vehicles in target lane)
* speed.kph
= speed of the vehicle in current time frames
bar
data set is not small enough to completely reproduce here as it contains both lane-change and non-lane-change time frames. They are also required in this question. So, I have uploaded bar
on Google Drive. You can download it here: https://drive.google.com/open?id=0ByvW4Hq_6a56dnIxYWh6M2ZRTUE (csv file)
Code to load csv file bar
:
library(tibble)
bar <- as_tibble(read.csv("bar.csv", header=TRUE))
What I want to do
I want to use the bar
and foo
data sets to:
1. Extract speeds of lead and following vehicles at START FRAME of LANE CHANGE (Start_Frame_CS
)
2. Extract speeds of lead and following vehicles at LANE CHANGE FRAME(lcf
)
3. Extract speeds of lead and following vehicles at END FRAME of LANE CHANGE (End_Frame1_CS
)
4. Extract mean speed of lead and following vehicles during lane change i.e. mean of ALL the speeds including and between Start_Frame_CS
and End_Frame1_CS
What I have tried
I can manually do this by using dplyr::left_join
multiple times. Following is how I extract speeds of lead_veh_TL
at lcf
& Start_Frame_CS
:
Lead Veh Speed at Lane Change Frame
library(dplyr)
lead_veh_TL_lcf <- foo %>%
select(-ends_with("CS"), -foll_veh_TL) %>%
left_join(x=., y = bar,
by = c("file.ID"="file.ID","lcf"="frames",
"TL" = "lane", "lead_veh_TL" = "ADO_name") )%>%
filter(!(is.na(lead_veh_TL)==TRUE)) %>%
rename(speed.kph_LV_TL_lcf = speed.kph)
> lead_veh_TL_lcf
# A tibble: 2 x 5
file.ID lcf TL lead_veh_TL speed.kph_LV_TL_lcf
<chr> <int> <chr> <chr> <dbl>
1 Cars_20160601_01.hdf5 43207 right_lane StarT7 79.54961
2 Cars_20160601_02.hdf5 52843 left_lane BMWC10 103.71717
Lead Veh Speed at Start Frame
lead_veh_TL_SF <- foo %>%
select(-lcf, -foll_veh_TL, -End_Frame1_CS) %>%
left_join(x=., y = bar,
by = c("file.ID"="file.ID","Start_Frame_CS"="frames",
"TL" = "lane", "lead_veh_TL" = "ADO_name") )%>%
filter(!(is.na(lead_veh_TL)==TRUE)) %>%
rename(speed.kph_LV_TL_SF = speed.kph)
> lead_veh_TL_SF
# A tibble: 2 x 5
file.ID TL lead_veh_TL Start_Frame_CS speed.kph_LV_TL_SF
<chr> <chr> <chr> <dbl> <dbl>
1 Cars_20160601_01.hdf5 right_lane StarT7 42899 79.54841
2 Cars_20160601_02.hdf5 left_lane BMWC10 52498 102.87223
mean speed of Lead Vehicle
foo_mean_LV <- bar %>%
left_join(x =., y = foo %>% select(-lcf, -foll_veh_TL),
by = c("file.ID" = "file.ID")) %>%
group_by(file.ID) %>%
filter(frames>=Start_Frame_CS & frames<=End_Frame1_CS, ADO_name==lead_veh_TL) %>%
ungroup() %>%
group_by(file.ID, lead_veh_TL) %>%
summarize(Start_Frame_CS = unique(Start_Frame_CS),
End_Frame1_CS = unique(End_Frame1_CS),
mean_sp_LV = mean(speed.kph),
sd_sp_LV = sd(speed.kph)) %>%
ungroup()
> foo_mean_LV
# A tibble: 2 x 6
file.ID lead_veh_TL Start_Frame_CS End_Frame1_CS mean_sp_LV sd_sp_LV
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cars_20160601_01.hdf5 StarT7 42899 43476 79.54532 0.006486832
2 Cars_20160601_02.hdf5 BMWC10 52498 53211 100.94923 1.608811109
For the Following Vehicle, I can simply replace the lead_veh_TL
in the above code with foll_veh_TL
.
Problem
As you can see, writing code repeatedly in this manner is tedious and also error-prone. I want to use a function where I could just provide the time frame and type of vehicle (lead/following) and everything else remains the same. However, I can't seem to find a way to write such a function. I only found one related answer here. But that doesn't solve my problem.
Please guide me how can I write an efficient function to get the desired results. My original data set has many more variables along with the speed.kph
variable