I have the following dataset that I'm hoping to apply some custom logic to:
data = pd.DataFrame({'ID': ['A','B','B','C','C','D','D'],
'Date': ['2018-07-02T02:21:12.000+0000','2018-07-02T02:28:29.000+0000','2018-07-02T02:28:31.000+0000','2018-07-02T02:30:58.000+0000','2018-07-02T02:31:01.000+0000','2018-07-02T02:42:46.000+0000','2018-07-02T02:41:47.000+0000'],
'Action': ['Start','Start','Start','Stop','Stop','Start','Start'],
'Group': [5,13,13,19,19,2,2],
'Value': [100,110,110,95,95,280,280]
})
Rows 1:2, 3:4, and 5:6 are all identical except for the values in the Date column, which are off by a matter of seconds. Is there a way to remove duplicates if 1) the "date" timedelta between similar rows is less than 1 minute and 2) all other information is identical?
The result should look like the following:
result = pd.DataFrame({
'ID': ['A','B','C','D'],
'Date': ['2018-07-02T02:21:12.000+0000','2018-07-02T02:28:29.000+0000','2018-07-02T02:30:58.000+0000','2018-07-02T02:42:46.000+0000'],
'Action': ['Start','Start','Stop','Start'],
'Group': [5,13,19,2],
'Value': [100,110,95,280]
})