1

I have a df in r with numerous records with the below format, with 'arrival_time' values for a 12 hour period'.

id arrival_time wait_time_value
1 2020-02-20 12:02:00 10
2 2020-02-20 12:04:00 5
99900 2020-02-20 23:47:00 8
10000 2020-02-20 23:59:00 21

I would like to create a new df that has a row for each 15 minute slot of the arrival time period and the wait_time_value of the record with the earliest arrival time in that slot. So, in the above example, the first and last row of the new df would look like:

id period_start wait_time_value
1 2020-02-20 12:00:00 10
48 2020-02-20 23:45:00 8

I have used the below code to achieve this for the mean average wait time for all records in each 15 minute range, but i'm not sure how to select the value for the earliest record?


df$period_start <- align.time(df$arrival_time- 899, n = 60*15)

avgwait_df <- aggregate(wait_time_value ~ period_start, df, mean)
Ben G
  • 95
  • 9
  • Can you provide data in a reproducible format which is easier to copy? Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269) – Ronak Shah May 19 '21 at 11:47

2 Answers2

2

Use DataFrame.resample with GroupBy.first, remove only NaNs and convert to DataFrame:

df['arrival_time'] = pd.to_datetime(df['arrival_time'])

df = (df.resample('15Min', on='arrival_time')['wait_time_value']
        .first()
        .dropna()
        .reset_index(name='wait_time_value'))
print (df)
         arrival_time  wait_time_value
0 2020-02-20 12:00:00             10.0
1 2020-02-20 23:45:00              8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Using dplyr:

df %>%
   group_by(period_start) %>%
   summarise(wait_time = min(wait_time_value))
Claudio
  • 1,229
  • 6
  • 11
  • Would this not provide the minimum wait_time_value rather than the value for the earliest arrival time? – Ben G May 19 '21 at 10:45
  • My understanding from your question was that that was what you wanted. You can get the earliest arrival time by substituting `wait_time_value` with `arrival_time` in the `min` function. – Claudio May 19 '21 at 11:05