2

Working on Pandas and need to remove the n consecutive rows in the DataFrame based on value in column.

In the example bellow, there is an event at 17:00:01 that last for 2 sec. I need the following 2 rows in that times span dropped. There is another event at 17:00:04 and then row 17:00:05 should be dropped.

Unsure on how to approach this. Use masking in a lamda?

t = pd.to_timedelta(df['EventSeconds'], unit='s')
mask = df['2019-01-07 17:00:02' : '2019-01-07 17:00:02' + t]

I have:

Index               EventSeconds OtherColumn
07/01/2019 16:59:59 0            2
07/01/2019 17:00:00 2            3
07/01/2019 17:00:01 0            4
07/01/2019 17:00:02 0            5
07/01/2019 17:00:03 0            6
07/01/2019 17:00:04 1            7
07/01/2019 17:00:05 0            8
07/01/2019 17:00:06 0            9

I need:

Index               EventSeconds OtherColumn
07/01/2019 16:59:59 0            2
07/01/2019 17:00:00 2            3
07/01/2019 17:00:03 0            6
07/01/2019 17:00:04 1            7
07/01/2019 17:00:06 0            9
user6453877
  • 314
  • 1
  • 4
  • 14

1 Answers1

1

You can add the duration to Index to get end times, but you need to use ffill for even with 0 seconds:

t = pd.to_timedelta(df['EventSeconds'], unit='s')

# print end_times to see details    
end_times = (df['Index'].add(t)                   # calculate the end time
                .where(df['EventSeconds'].ne(0))  # mask the starting events
                .ffill()                          # fill the same end times
            )

df[df['Index'].gt(end_times)| df['EventSeconds'].ne(0) ]

Output:

                Index  EventSeconds
0 2019-07-01 16:59:59             0
1 2019-07-01 17:00:00             2
4 2019-07-01 17:00:03             0
5 2019-07-01 17:00:04             1
7 2019-07-01 17:00:06             0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • This works, but all rows before the first events are set to NaT in t and lost in the output. Any suggestions on how to add the NaT rows? – user6453877 Oct 04 '19 at 10:04