2

So we have a Pandas DataFrame with certain values at certain times.

For example:

    @ts               @value Glucose Diff   smooth_diff new P          N    C1  C2

135 2021-10-29 11:16:00 167  167.0  -3.0    15.45   15.45   17.95   17.45   NaN 0.0
155 2021-10-29 12:56:00 162  162.0  -15.0   15.35   15.35   17.95   16.00   NaN 0.0
243 2021-10-29 20:16:00 133  133.0  0.0     15.25   15.25   19.85   15.75   NaN 0.0
245 2021-10-29 20:26:00 134  134.0  0.0     15.50   15.50   15.75   15.60   NaN 0.0
113 2021-10-29 09:26:00 130  130.0  1.0     16.75   16.75   0.00    21.70   NaN NaN

Now we want to drop the rows that are in an 1 hour time interval (the @ts column) of each other (so in this example we want to drop the row at 2021-10-29 20:26:00 as it is within an 1 hour time span of the previous one), but we can't seem to figure out a way to do this.

Any help?

BroodjeBal
  • 53
  • 5

2 Answers2

1

Something like this might work:

import pandas as pd

# create dataframe (only creating 2 cols for ease)
df = pd.DataFrame({
    '@ts': ['2021-10-29 11:16:00', '2021-10-29 12:56:00', '2021-10-29 20:16:00', 
            '2021-10-29 20:26:00'],
    '@value': [167, 162, 133, 134]
})

# split @ts column into separate columns - date(d) and time(t)
df[["d", "t"]] = df["@ts"].str.split(" ", expand=True)

# split time column into separate parts, hours, mins and secs
df[["h", "m", "s"]] = df["t"].str.split(":", expand=True)
# drop duplicates based on date and hour, keep the first row
df = df.drop_duplicates(subset=["d", "h"], keep="first")
EM789
  • 56
  • 4
1

Convert the column to datetime. Subtract the time with the previous row time and then evaluate the total seconds. Calculate the abs value and check if it's greater than 3600 or not to create a boolean mask. Then, use the boolean mask to filter the required rows.

df['@ts'] = pd.to_datetime(df['@ts'])
df = df[~(df['@ts'] - df['@ts'].shift()
          ).dt.total_seconds().fillna(np.inf).apply(abs).lt(3600)]
Nk03
  • 14,699
  • 2
  • 8
  • 22