The graph shows water temperature against time. When there is an activation, temperature will increase. When the activation ends, temperature will start decreasing(although sometimes there may be a time lag).
I would like to calculate the number of times where there is an event(each blue circle represents one activation). There are times of random noise(red circles - indicates random temperature change, but you can see there is only increases or decrease but not both, implying that it is not a proper event).
Temperature records update for every 0.5°C change in temperature, regardless of time.
I have tried using 1) temperature difference, and 2) temperature change gradient of adjacent data points to identify the event start timestamps and end timestamps, and counting it as one event. But this is not very accurate.
I am told that I should use only the temperature difference and identify the pattern of (increase - max temp - decrease) as one event. Any ideas what is an appropriate way to calculate the total number of activations?
Update1:
Sample Data:
id timestamp temperature
27581 27822 2020-01-02 07:53:05.173 19.5
27582 27823 2020-01-02 07:53:05.273 20.0
27647 27888 2020-01-02 10:01:46.380 20.5
27648 27889 2020-01-02 10:01:46.480 21.0
27649 27890 2020-01-02 10:01:48.463 21.5
27650 27891 2020-01-02 10:01:48.563 22.0
27711 27952 2020-01-02 10:32:19.897 21.5
27712 27953 2020-01-02 10:32:19.997 21.0
27861 28102 2020-01-02 11:34:41.940 21.5
...
Update2:
Tried:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Date'] = [datetime.datetime.date(d) for d in df['timestamp']]
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'] == '2020-01-02']
# one does not need duplicate temperature values,
# because the task is to find changing values
df2 = df.loc[df['temperature'].shift() != df['temperature']]
# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)
# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts
# preparing plot
fig,ax = plt.subplots(figsize=(20,20))
ax.xaxis_date()
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))
# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()
# 'events'
ax2.plot(df2['events'],'-xg')
## uncomment next two lines for plotting of signum
# ax3=ax.twinx()
# ax3.plot(df2['sig'],'-m')
# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
mdates.date2num(pd.Timestamp(maxLim)))
plt.show()
and incurred a blank graph with messages:
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:31: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:38: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Update3:
Writing a for-loop to generate a graph for each day:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Date'] = df['timestamp'].dt.date
df.set_index(df['timestamp'], inplace=True)
start_date = pd.to_datetime('2020-01-01 00:00:00')
end_date = pd.to_datetime('2020-02-01 00:00:00')
df = df.loc[(df.index >= start_date) & (df.index <= end_date)]
for date in df['Date'].unique():
df_date = df[df['Date'] == date]
# one does not need duplicate temperature values,
# because the task is to find changing values
df2 = pd.DataFrame.copy(df_date.loc[df_date['temperature'].shift() != df_date['temperature']])
# ye good olde forward difference
der = np.sign(np.diff(df2['temperature']))
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = der
# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts
# preparing plot
fig,ax = plt.subplots(figsize=(30,10))
ax.xaxis_date()
# df2['timestamp'] = pd.to_datetime(df2['timestamp'])
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))
# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()
# 'events'
g= ax2.plot(df2['events'],'-xg')
# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
mdates.date2num(pd.Timestamp(maxLim)))
ax.autoscale()
plt.title(date)
print(np.count_nonzero(df2['events'][minLim:maxLim]))
plt.show(g)
The graph worked but not the number of counts.
Update4:
It looks like some graphs(eg. 2020-01-01, 2020-01-04, 2020-01-05) are over a random fragment of time(probably on the weekends). Is there a way to delete these days?