2

I've looked around (eg. Python - Locating the closest timestamp) but can't find anything on this.

I have a list of datetimes, and a dataframe containing 10k + rows, of start and end times (formatted as datetimes).

The dataframe is effectively listing parameters for runs of an instrument.

The list describes times from an alarm event.

The datetime list items are all within a row (i.e. between a start and end time) in the dataframe. Is there an easy way to locate the rows which would contain the timeframe within which the alarm time would be? (sorry for poor wording there!)

eg.

for i in alarms:
    df.loc[(df.start_time < i) & (df.end_time > i), 'Flag'] = 'Alarm'

(this didn't work but shows my approach)

Example datasets

# making list of datetimes for the alarms

df = pd.DataFrame({'Alarms':["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})

df['Alarms'] = pd.to_datetime(df['Alarms'])

alarms = list(df.Alarms.unique())

# dataframe of runs containing start and end times
n=33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({ 'start_date': rng1, 'end_Date': rng2})

Herein a flag would go against line (well, index) 4, 13 and 21.

BAC83
  • 811
  • 1
  • 12
  • 27

2 Answers2

4

You can use pandas.IntervalIndex here:

# Create and set IntervalIndex
intervals = pd.IntervalIndex.from_arrays(df.start_date, df.end_Date)
df = df.set_index(intervals)

# Update using loc
df.loc[alarms, 'flag'] = 'alarm'

# Finally, reset_index
df = df.reset_index(drop=True)

[out]

            start_date            end_Date   flag
0  2019-07-18 00:00:00 2019-07-18 03:00:00    NaN
1  2019-07-18 03:00:00 2019-07-18 06:00:00    NaN
2  2019-07-18 06:00:00 2019-07-18 09:00:00    NaN
3  2019-07-18 09:00:00 2019-07-18 12:00:00    NaN
4  2019-07-18 12:00:00 2019-07-18 15:00:00  alarm
5  2019-07-18 15:00:00 2019-07-18 18:00:00    NaN
6  2019-07-18 18:00:00 2019-07-18 21:00:00    NaN
7  2019-07-18 21:00:00 2019-07-19 00:00:00    NaN
8  2019-07-19 00:00:00 2019-07-19 03:00:00    NaN
9  2019-07-19 03:00:00 2019-07-19 06:00:00    NaN
10 2019-07-19 06:00:00 2019-07-19 09:00:00    NaN
11 2019-07-19 09:00:00 2019-07-19 12:00:00    NaN
12 2019-07-19 12:00:00 2019-07-19 15:00:00    NaN
13 2019-07-19 15:00:00 2019-07-19 18:00:00  alarm
14 2019-07-19 18:00:00 2019-07-19 21:00:00    NaN
15 2019-07-19 21:00:00 2019-07-20 00:00:00    NaN
16 2019-07-20 00:00:00 2019-07-20 03:00:00    NaN
17 2019-07-20 03:00:00 2019-07-20 06:00:00    NaN
18 2019-07-20 06:00:00 2019-07-20 09:00:00    NaN
19 2019-07-20 09:00:00 2019-07-20 12:00:00    NaN
20 2019-07-20 12:00:00 2019-07-20 15:00:00    NaN
21 2019-07-20 15:00:00 2019-07-20 18:00:00  alarm
22 2019-07-20 18:00:00 2019-07-20 21:00:00    NaN
23 2019-07-20 21:00:00 2019-07-21 00:00:00    NaN
24 2019-07-21 00:00:00 2019-07-21 03:00:00    NaN
25 2019-07-21 03:00:00 2019-07-21 06:00:00    NaN
26 2019-07-21 06:00:00 2019-07-21 09:00:00    NaN
27 2019-07-21 09:00:00 2019-07-21 12:00:00    NaN
28 2019-07-21 12:00:00 2019-07-21 15:00:00    NaN
29 2019-07-21 15:00:00 2019-07-21 18:00:00    NaN
30 2019-07-21 18:00:00 2019-07-21 21:00:00    NaN
31 2019-07-21 21:00:00 2019-07-22 00:00:00    NaN
32 2019-07-22 00:00:00 2019-07-22 03:00:00    NaN
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
1

you were calling your columns start_date and end_Date, but in your for you use start_time and end_time.

try this:

import pandas as pd

df = pd.DataFrame({'Alarms': ["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})

df['Alarms'] = pd.to_datetime(df['Alarms'])

alarms = list(df.Alarms.unique())

# dataframe of runs containing start and end times
n = 33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({'start_date': rng1, 'end_Date': rng2})


for i in alarms:
    df.loc[(df.start_date < i) & (df.end_Date > i), 'Flag'] = 'Alarm'
print(df[df['Flag']=='Alarm']['Flag'])

Output:

4     Alarm
13    Alarm
21    Alarm
Name: Flag, dtype: object
Adam.Er8
  • 12,675
  • 3
  • 26
  • 38