How do I split a dataframe based on datetimes differences?

Question

Say I have this dataframe with datetimes separated by an unknown time interval:

data[0]:
        mintime                              check
1375    2020-02-18 12:17:51.275000064+00:00  GO1
1376    2020-02-18 12:17:56.484999936+00:00  GO1
1377    2020-02-18 12:18:06.020000+00:00     GO1
1378    2020-02-18 12:18:10.922000128+00:00  NOGO
1379    2020-02-18 14:47:48.353999872+00:00  GO2
1380    2020-02-18 14:47:48.768000+00:00     GO2
1381    2020-02-18 14:48:03.120000+00:00     GO2

I am trying to split the dataframe. That is, if the datetimes are separated by no more than 15 seconds, they will be grouped into a new dataframe.

My attempt to do this begins with the column check. That column tells if the value on its row and the following value are separated within 15 seconds (GO) or more than 15 seconds (NOGO).

The reason I add a number after GO is to be able to distinguish groups of GO's. And this is my attempt code:

databds = []
intervalo = pd.Timedelta(seconds = 15)
p = 0
for x in range(0,len(data)):
    for y in range(0,len(data[x])-1):     
        t = pd.to_datetime(data[x]['mintime'][y][0:19])
        tp1 = pd.to_datetime(data[x]['mintime'][y+1][0:19])
        resta = tp1 - t
        if resta > intervalo:
            data[x]['check'][y] = "NOGO"
            p = p + 1
        else:
            data[x]['check'][y] = "{}{}".format("GO", p)   
    for z in range(0,p):
        datito = data[x].loc[data[x]['check'] == "{}{}".format("GO", z)]
        databds.append(datito)

This process is long and demanding on resources. I believe there must be an easier way to do this. I have tried applying pandas resample with no luck tho.

Does this answer your question? [Python Pandas: Group datetime column into hour and minute aggregations](https://stackoverflow.com/questions/16266019/python-pandas-group-datetime-column-into-hour-and-minute-aggregations) — Danail Petrov, May 28 '21 at 09:44
Hi @DanailPetrov. No, that does not answer the question. I do not want to resample based on a time interval, not by week or day or minute. I want to split the dataframe into smaller dataframes whose datetimes are 15 seconds appart or less. — Peter La Anguila, May 28 '21 at 09:51

Nk03 · Accepted Answer · 2021-05-28T16:17:56.827

3

UPDATED ANSWER:

Convert the mintime column to datetime via pd.to_datetime.
Evaluate the difference in seconds / fill NAN values with 0 and check if the difference is greater than 15 sec or not. Take the cumsum of the result and use groupby .

df.mintime = pd.to_datetime(df.mintime)
df_list = [g for k,g  in df.groupby((~(df.mintime.diff().dt.total_seconds().fillna(0) < 15)).cumsum())]

OUTPUT:

[                                 mintime check
 1375 2020-02-18 12:17:51.275000064+00:00   GO1
 1376 2020-02-18 12:17:56.484999936+00:00   GO1
 1377    2020-02-18 12:18:06.020000+00:00   GO1
 1378 2020-02-18 12:18:10.922000128+00:00  NOGO,
                                  mintime check
 1379 2020-02-18 14:47:48.353999872+00:00   GO2
 1380    2020-02-18 14:47:48.768000+00:00   GO2
 1381    2020-02-18 14:48:03.120000+00:00   GO2]

edited May 28 '21 at 16:17

answered May 28 '21 at 09:56

Nk03

14,699
2
8
22

This is close to the wanted solution. The problem is it returns the datetimes corresponding to 15seconds of the minute. That is, groups the datetimes within the first 15 seconds of the minute, then groups on the next 15 seconds, and so on till the minute ends. – Peter La Anguila May 28 '21 at 10:47
1

Top job here. Thank you very much for your help. – Peter La Anguila May 28 '21 at 11:02

How do I split a dataframe based on datetimes differences?

1 Answers1