0

I have data on trains departing a train station every day over a month. The data includes the departure time and the number of passengers travelling.

I have a separate dataset which estimates, in 5 minute intervals, the percentage of passengers that arrive over the 2 hours prior to departure.

Id like to apply the percentage distribution to each row of data, essentially generating 24 lines (1 for each 5 minute interval) for each train that departs.

Unsure of the method to achieve what I need, but the output I'm looking for should essentially tell you the time passengers arrive before their train departs.

Id appreciate any help with this.

Thank you.

Train no. Passengers Departure
11111 750 2018-01-01 07:00:00
11112 900 2018-01-01 08:00:00
Hours before departure Percentage arriving
02:00:00. 0.1%.
01:55:00. 0.5%.
01:50:00 1.1%.
digidrago
  • 1
  • 1
  • Yes, this is quite possible, but it's not possible to give you any specific code without some sample data. Please edit your question to include a sample of your train data and passenger data (as text, not as images). You may find the `dput` function helpful for that purpose. – Allan Cameron Nov 21 '22 at 19:25
  • This is a statistical modeling question, not a programming one, so to me it looks off-topic for stackoverflow. It may also be more challenging than you think, because it looks like a non-homogeneous Poisson process may apply. – pjs Nov 21 '22 at 19:26
  • Thanks both - I've edited with a few lines of example data. Also happy to move this question to another community, could you advise where might be more suitable? – digidrago Nov 21 '22 at 20:20
  • I'd suggest stats.stackexchange.com – pjs Nov 21 '22 at 20:50

0 Answers0