2

I have two large lists t and y and I want to determine in a performant way at which times and how long the data in y exceeds a predefined limit, i.e. >=limit.

The problem may be illustrated with the following sample data:

t = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
y = [8,6,4,2,0,2,4,6,8,6,4,2,0,2,4,6,8]
limit = 4

enter image description here

In this example, the code should return the following lists:

t_exceedance_start = [0,6,14]
t_how_long_above_limit = [2,4,2]

I would expect that this can be implemented quite elegant in Numpy but did not find out how .

Any suggestions are highly appreciated.

Rickson
  • 1,040
  • 2
  • 16
  • 40

1 Answers1

1

Here's one vectorized approach making use of booleans for performance efficiency -

# Get array versions if aren't already
y = np.asarray(y)
t = np.asarray(t)

# Get mask of thresholded y with boundaries of False on either sides.
# The intention is to use one-off shifted comparison to catch the
# boundaries of each island of thresholed True values (done in next step).
# Those appended False values act as triggers to catch the start of 
# first island and end of last island.
mask = np.concatenate(( [False], y>=limit, [False] ))
idx = np.flatnonzero(mask[1:] != mask[:-1])

# The starting indices for each island would be the indices at steps of 2.
# The ending indices would be steps of 2 as well starting from first index.
# Thus, get the island lengths by simply differencing between start and ends.
starts = idx[::2]
ends =   idx[1::2] - 1
lens = ends - starts

# Get starts, ends, lengths according to t times
start_times = t[starts]
end_times = t[ends]
len_times = end_times - start_times
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Hmmm. start_times is computed correctly but there are some issues with the lens. The resulting list basically only consists of zeros 8and some ones). May it be because the real time vector has a very fine resolution (0.0001, 0.00012, 0.00013, ...) and its time stamps are not equidistant? – Rickson Sep 20 '17 at 12:31
  • @Rickson What does `(np.asarray(y)>=limit).sum()` give you? – Divakar Sep 20 '17 at 12:35
  • 77. length of lens and start_times is 75. – Rickson Sep 20 '17 at 12:44
  • @Rickson Let me ask you - If instead of `y = [8,6,4,2,0,2,4,6,8,6,4,2,0,2,4,6,8]`, you had : `y = [2,2,4,2,0,2,4,6,8,6,4,2,0,2,4,6,8]`, i.e. the first two elements changed, what would be the output? Some lengths would be zeros, because the way you are interpreting lengths is not *exactly* correct. – Divakar Sep 20 '17 at 13:00
  • True. I would need a list "ends" as well to be able to calculate each interval length (e. g. ends[i]-starts[i] for each interval i). – Rickson Sep 20 '17 at 13:08
  • @Rickson Can you guess from code what would be `ends`? :) – Divakar Sep 20 '17 at 13:10
  • I had to do some adaptions in your answer. I have added the new proposal to my original question. Could you please review it and let me know whether I get something wrong. If everything is correct you may update your answer accordingly so I can accept it as the final solution. Thanks for your help. Appreciate it! – Rickson Sep 20 '17 at 19:03
  • @Rickson Review on what exactly? Is something not working? – Divakar Sep 20 '17 at 19:05
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/154937/discussion-between-rickson-and-divakar). – Rickson Sep 20 '17 at 19:06