I have a data frame with measurements every 10 minutes (n=85000 values, thus should be most efficient). I need to count how often a value is larger than 400, but only if this limit is hit also for at least 30 minutes (thus we search for groups of 3 consectutive rows, which hit the criteria lws>400 ).
Example would be:
df = pd.DataFrame({
'datetime':pd.date_range('1/1/2011', periods=20, freq='10Min'),
'lws':np.random.randint(300, 500, size=(20,)),
})
df.set_index('datetime', inplace=True)
df['alarm']=np.where(df['lws']>400, 1, 0)
gives
datetime,lws,alarm
2011-01-01 00:00:00,338,0
2011-01-01 00:10:00,435,**1
2011-01-01 00:20:00,415,**1
2011-01-01 00:30:00,321,0
2011-01-01 00:40:00,489,**1
2011-01-01 00:50:00,327,0
2011-01-01 01:00:00,481,**1
2011-01-01 01:10:00,316,0
2011-01-01 01:20:00,317,0
2011-01-01 01:30:00,455,1
2011-01-01 01:40:00,457,1
2011-01-01 01:50:00,442,1
2011-01-01 02:00:00,407,1
2011-01-01 02:10:00,407,**1
2011-01-01 02:20:00,416,**1
2011-01-01 02:30:00,331,0
2011-01-01 02:40:00,367,0
2011-01-01 02:50:00,360,0
2011-01-01 03:00:00,416,**1
2011-01-01 03:10:00,341,0
Result should be a new dataframe column containing a "1" if criteria is hit and a "0" if not, wrong assumptions of trial here are marked with **(2 asterix)
The fast answer here is not working correctly. I now established a possibilty, which is actually working, but needs additional columns, and might be slow for huge datasets:
m = df['lws']>400
df['m']=np.where (df['lws']>400, 1, 0)
df['differ'] = df.m.diff().ne(0).cumsum()
df['gs']= df.groupby ('differ')['m'].transform(len)
df ['flag']= np.where ((df['gs']>2)& (m),1, 0 )
Can someone find a solution, which does not involve the additional columns?
With the idea from here
Many thanks for help!