0

I am quite stuck.

I have a set of data looking like this: enter image description here

I have a date and 0 is the number of baggage for that specific time interval.

I would like to create a 30 minute forward looking rolling time window, and for each time period I need the flight name within this time window along with the number of baggage.

I want the output to be in a dictionary or data frame where keys is the time period.

I have the following code where I tried some different things. But I can't get the result I want. And the following code is not forward looking.

# rolling sum
df_temp = df_conflict[['sta+time_to_infeed_left',0]].rolling('1800s', on='sta+time_to_infeed_left',min_periods=0).sum().reset_index()
df_temp.sort_values('sta+time_to_infeed_left').head(45)

Is this something you can help with? :-)

Example of data

    ArrivalFlightName   sta sta+time_to_infeed_left sta+time_to_infeed_right    idx 0
18  Flightname402   2019-12-09 20:00:00 2019-12-09 20:01:05.663474700   2019-12-09 20:03:55.006119960   21  1
24  Flightname411   2019-12-10 08:20:00 2019-12-10 08:22:28.606811160   2019-12-10 08:25:00.669594660   27  8
11  Flightname376   2019-12-10 09:25:00 2019-12-10 09:27:26.878824960   2019-12-10 09:30:16.221470220   14  1
19  Flightname404   2019-12-10 10:10:00 2019-12-10 10:12:00.095039220   2019-12-10 10:14:49.437684480   22  1
5   Flightname362   2019-12-10 10:10:00 2019-12-10 10:12:36.382748940   2019-12-10 10:15:25.725394200   8   3
2   Flightname349   2019-12-10 10:15:00 2019-12-10 10:17:05.278997760   2019-12-10 10:19:54.621643020   3   4
8   Flightname368   2019-12-10 10:30:00 2019-12-10 10:32:19.966880280   2019-12-10 10:35:09.309525540   11  1
23  Flightname410   2019-12-10 11:20:00 2019-12-10 11:22:26.878824960   2019-12-10 11:25:16.221470220   26  3
17  Flightname401   2019-12-10 11:55:00 2019-12-10 11:56:05.663474700   2019-12-10 11:58:55.006119960   20  19
15  Flightname391   2019-12-10 12:20:00 2019-12-10 12:21:21.215350260   2019-12-10 12:21:36.767225880   18  10
4   Flightname359   2019-12-10 12:40:00 2019-12-10 12:42:26.878824960   2019-12-10 12:45:16.221470220   7   2
7   Flightname364   2019-12-10 13:00:00 2019-12-10 13:02:36.382748940   2019-12-10 13:05:25.725394200   10  

Example of output dictionary. Where the the flightindex's that are in several periods are flights that are in the rolling time window.

{'period_1': {'Flightname idx': [177, 181], 'Baggage': [103, 87]},
 'period_2': {'Flightname  idx': [177, 102, 115, 164],
  'Baggage': [103, 59, 16, 8]},
 'period_3': {'Flightname  idx': [164, 173, 174], 'Baggage': [27, 50, 8]},
 'period_4': {'Flightname idx': [70, 77, 118], 'Baggage': [19, 3, 24]},
 'period_5': {'Flightname idx': [70, 98], 'Baggage': [19, 4]},
  • 1
    I have some question: Can you provide an example of the exact dict you want? Can the same ArrivalFlightName appear serveral time within your window of 30mins or is it unique? – tturbo Oct 12 '22 at 12:01
  • 1
    Hi @tturbo, Flightnames are unique but they can appear in different time windows. An example of the output dict has been added :-) Thanks – Nora Seinfield Oct 12 '22 at 12:09
  • 1
    do you really want a rolling window, if so with what steps size (e.g. every minute, meaning start time of every window is 1 minute different)? or do you just want slices of 30min? – tturbo Oct 12 '22 at 12:51
  • The goal is to create a 30 minute rolling window every 10 minutes. So 07:10-07:40, 07:20-07:50 etc. So flights are repeated in different windows. Does that make sense? :-) – Nora Seinfield Oct 12 '22 at 13:02

1 Answers1

0

I try to give you some hints. However my code is not tested, as i don't have your entire input to test (and also no time to do so ;).

For that you want window with length 30min and step size 10min, you must set these values in the rolling function. Also, your interested not only in row 0 (not a very clear name), but also in the 'ArrivalFlightName'

# set datetime index and selecting rows you are interested in
df_temp = df_conflict.set_index('sta+time_to_infeed_left')['ArrivalFlightName', 0]

# convert in rolling window of 30min and step size 10min
df_temp.rolling('30m', step='10m') # not yet aggregated...

# next, you want to aggregate the values into lists
df_temp = df_temp.agg({'ArrivalFlightName': lambda x: list(x), {0: lambda x: list(x)})

I hope this helps... More details on aggregating into list

tturbo
  • 680
  • 5
  • 16
  • Thanks for taking your time! Means a lot! I've been down that road as well though not with the aggregating list - I will look into that. There is no stepsize in the the rolling function though, and doesn't rolling look at is presedence and not look forward? Sorry for all the questions :-) – Nora Seinfield Oct 12 '22 at 13:51
  • Your welcome, if it helps then please consider accept it. *"There is no stepsize in the the rolling function though"* what do you mean, there is a `step` attribute according to the [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)? *"and doesn't rolling look at is presedence and not look forward"* also not understanding what you mean be that. I think you should play around and if it does not do what you want ask an other question with your code, your not-wanted result and your desired result – tturbo Oct 12 '22 at 14:04