I have data on trains departing a train station every day over a month. The data includes the departure time and the number of passengers travelling.
I have a separate dataset which estimates, in 5 minute intervals, the percentage of passengers that arrive over the 2 hours prior to departure.
Id like to apply the percentage distribution to each row of data, essentially generating 24 lines (1 for each 5 minute interval) for each train that departs.
Unsure of the method to achieve what I need, but the output I'm looking for should essentially tell you the time passengers arrive before their train departs.
Id appreciate any help with this.
Thank you.
Train no. | Passengers | Departure |
---|---|---|
11111 | 750 | 2018-01-01 07:00:00 |
11112 | 900 | 2018-01-01 08:00:00 |
Hours before departure | Percentage arriving |
---|---|
02:00:00. | 0.1%. |
01:55:00. | 0.5%. |
01:50:00 | 1.1%. |