0

I have the following arrays:

time = [1e-6, 2e-6, 3e-6, 4e-6, 5e-6, 6e-6, 7e-6, 8e-6, 9e-6, 10e-6]
signal = [0, 10, 3, 2, 1, 0, 10, 2, 2, 5]

and I want to remove (from both arrays) any datapoints that are above a threshold value, with a given padding width

threshold = 9
padding = 3e-6

so any indexes that are above 9 in the signal array or are within 100 data points in the time array should be removed from both arrays. Note: this means there could be overlap if there are two data points within the padding window that are above the threshold

example output

time_out = [4e-6, 5e-6, 9e-6, 10e-6]
signal_out = [2, 1, 2, 5]

EDIT: this post is very similar, however it does it only for one index of an array, where I would need to do it at multiple (above e.g. time=2e-6 and time=7e-6) https://stackoverflow.com/a/66695205/12728698

joshp
  • 706
  • 5
  • 22
  • could you include a sample that is not cut off along with an expected output? –  May 09 '22 at 23:07
  • @enke update question, untruncated and example output – joshp May 09 '22 at 23:13
  • why is `time=1e-6` not selected? –  May 09 '22 at 23:25
  • because it's within the `padding=3e-6` range of `time=2e-6` which has a `signal` (10) > `threshold=9` – joshp May 09 '22 at 23:27
  • `time=4e-6` is selected even though it's within `3e-6` range of `time=2e-6`. Is this a mistake or do I not understand your problem? –  May 09 '22 at 23:41
  • sorry, I think I wasn't clear, padding is total, centered at e.g. `time=2e-6`, so `1e-6, 2e-6, 3e-6` are the first `padding` window that is deleted – joshp May 09 '22 at 23:44
  • @Rabinzel that is correct – joshp May 09 '22 at 23:45
  • did you have a look on `scipy.signal find_peaks` ? I don't know if that helps in that specific case but it sounds similar. – Rabinzel May 09 '22 at 23:49

1 Answers1

1

Let's try this one. The idea is to create a boolean mask which returns True if a signal is out of reach of threshold for each padding. I divided the padding by 3, since IIUC, a padding is a window of size 3, so we only need to consider the signals that are greater than the threshold and its 2 adjacent values.

time_arr = np.array(time)
signal_arr = np.array(signal)

llim = time_arr[signal_arr>threshold, None] - padding/3
ulim = time_arr[signal_arr>threshold, None] + padding/3

msk = ((llim > time_arr) | (ulim< time_arr)).all(axis=0)
time_out = time_arr[msk]
signal_out = signal_arr[msk]

Another option is to use numpy.roll to get the adjacent values to create a boolean mask:

comp = signal_arr<=threshold
msk = np.roll(comp, 1) & comp & np.roll(comp, -1)
time_out = time_arr[msk]
signal_out = signal_arr[msk]

Output:

array([4.e-06, 5.e-06, 9.e-06, 1.e-05])
array([2, 1, 2, 5])
  • nm it works, Thanks! however, when I run it on my actual data `msk = ((llim > time_arr) | (ulim< time_arr)).all(axis=0)` takes about 6 seconds for ~5 million length vector, any chance of speedup? – joshp May 10 '22 at 00:23
  • @joshp That's probably because you have a lot of "peaks" and the boolean mask is becoming too large. I added another option; see if it works –  May 10 '22 at 02:03