-1

With dataset df I plotted a graph looking like the following:

df

Time    Temperature
8:23:04     18.5
8:23:04     19
9:12:57     19
9:12:57     20
9:12:58     20
9:12:58     21
9:12:59     21
9:12:59     23
9:13:00     23
9:13:00     25
9:13:01     25
9:13:01     27
9:13:02     27
9:13:02     28
9:13:03     28

Graph(Overall) enter image description here

When zooming in the data, we can see more details:

enter image description here

I would like to count the number of activations of this temperature measurement device, which gives rise to temperature increasing drastically. I have defined an activation as below:

Let T0, T1, T2, T3 be temperature at time t=0,t=1,t=2,t=3, and d0= T1-T0, d1= T2-T1, d2= T3-T2, ... be the difference of 2 adjacent values.

If

1) d0 ≥ 0 and d1 ≥ 0 and d2 ≥ 0, and

2) T2- T0 > max(d0, d1, d2), and

3) T2-T0 < 30 second

It is considered as an activation. I want to count how many activations are there in total. What's a good way to do this?

Thanks.

nilsinelabore
  • 4,143
  • 17
  • 65
  • 122

1 Answers1

1

There could be a number of different, valid answers depending on how a spike is defined.

Assuming you just want the indices where the temperature increases significantly. One simple method is to just look for very large jumps in value, above some threshold value. The threshold can be calculated from the mean difference of the data, which should give a rough approximation of where the significant variations in value occur. Here's a basic implementation:

import numpy as np

# Data
x = np.array([0, 1, 2, 50, 51, 52, 53, 100, 99, 98, 97, 96, 10, 9, 8, 80])

# Data diff
xdiff = x[1:] - x[0:-1]

# Find mean change
xdiff_mean = np.abs(xdiff).mean()

# Identify all indices greater than the mean
spikes = xdiff > abs(xdiff_mean)+1
print(x[1:][spikes])  # prints 50, 100, 80
print(np.where(spikes)[0]+1)  # prints 3, 7, 15

You could also look use outlier rejection, which would be much more clever than this basic comparison to the mean difference. There are lots of answers on how to do that: Can scipy.stats identify and mask obvious outliers?

Michael
  • 2,344
  • 6
  • 12
  • Mean plus n-sigma is a more traditional way. – Mad Physicist Feb 05 '20 at 03:28
  • Hi Michael, thanks for the answer. I just realised that I had a bit of misunderstanding about the dataset and oversimplified the problem. As such, I've edited the question and added some pseudo code. But it'd be great if you could keep your original code as I feel it provides some very good insight too. Thanks a lot. Much appreciate your contribution. – nilsinelabore Feb 05 '20 at 05:00