31

Is there an existing implementation in Python for detecting steps in one dimensional data?

E.g. something that detects one step in this data:

single step

There are quite a few descriptions of algorithms out there but I am wondering if something suited for the job exists in Python?

I'm not sure if/how I should provide that data but here it is:

[ 594.          568.55555556  577.22222222  624.55555556  546.66666667
552.88888889  575.55555556  592.33333333  528.88888889  576.11111111
625.          574.22222222  556.33333333  567.66666667  576.66666667
591.66666667  566.33333333  567.33333333  547.44444444  631.11111111
555.66666667  548.66666667  579.44444444  546.88888889  597.55555556
519.88888889  582.33333333  618.88888889  574.55555556  547.44444444
593.11111111  565.66666667  544.66666667  562.66666667  554.11111111
543.88888889  602.33333333  609.77777778  550.55555556  561.88888889
719.33333333  784.44444444  711.22222222  843.66666667  691.33333333
690.11111111  684.33333333  749.11111111  759.11111111  653.33333333
817.11111111  705.22222222  689.44444444  712.33333333  659.
683.88888889  713.          740.44444444  692.22222222  677.33333333
681.44444444  640.          717.55555556  717.88888889  769.22222222
690.88888889  786.          774.66666667  799.44444444  743.44444444
789.88888889  673.66666667  685.66666667  709.88888889  645.55555556
846.11111111  792.77777778  702.22222222  749.44444444  678.55555556
707.55555556  665.77777778  643.55555556  671.44444444  795.66666667
627.22222222  684.55555556  708.44444444  829.66666667  719.        ]
John Crow
  • 927
  • 3
  • 13
  • 26
  • 1
    This is a really interesting question, but library recommendations are sadly off-topic on SO. Still, upvote from me. – errantlinguist Dec 28 '17 at 02:07
  • @errantlinguist I've slightly altered the wording so as to not ask for a module directly. Hopefully it doesn't get taken down! – John Crow Dec 28 '17 at 02:18
  • This reminds me of a similar question that was posted recently: https://stackoverflow.com/questions/47290732/group-numbers-in-an-array-by-step-value-changes/47293220#47293220 – pault Dec 28 '17 at 03:07
  • " quite a few descriptions of algorithms" links to a journal article, not an algorithm per se. the type of python algorithm you want is likely found in [scipy](https://docs.scipy.org/doc/scipy/reference/) – ShpielMeister Dec 28 '17 at 03:20
  • 1
    and since you have. tagged signal-processing [scipy-signal](https://docs.scipy.org/doc/scipy/reference/tutorial/signal.html) might be a good place for you to do your own research. you do need to understand your own objective. the question you've asked is far from trivial and as pointed out not quite appropriate for SO. – ShpielMeister Dec 28 '17 at 03:27
  • @ShpielMeister The description of the algorithm is in the supplementary information if you are interested. I did have a look at scipy-signal and didn't see anything appropriate. – John Crow Dec 28 '17 at 03:58
  • convolve with a step, see if peak resolution is good enough – f5r5e5d Dec 28 '17 at 04:51
  • Wikipedia has a page on [step detection algorithms](https://en.wikipedia.org/wiki/Step_detection). – Bill May 15 '21 at 18:23
  • Did you look at the answers to this question: [Detect steps in a Piecewise constant signal](https://stackoverflow.com/questions/19714663/detect-steps-in-a-piecewise-constant-signal). I think they are applicable here. – Bill Nov 05 '21 at 00:11

2 Answers2

31

convolve with a step, see if peak resolution is good enough

import numpy as np
from matplotlib import pyplot as plt


d = '''594.          568.55555556  577.22222222  624.55555556  546.66666667
552.88888889  575.55555556  592.33333333  528.88888889  576.11111111
625.          574.22222222  556.33333333  567.66666667  576.66666667
591.66666667  566.33333333  567.33333333  547.44444444  631.11111111
555.66666667  548.66666667  579.44444444  546.88888889  597.55555556
519.88888889  582.33333333  618.88888889  574.55555556  547.44444444
593.11111111  565.66666667  544.66666667  562.66666667  554.11111111
543.88888889  602.33333333  609.77777778  550.55555556  561.88888889
719.33333333  784.44444444  711.22222222  843.66666667  691.33333333
690.11111111  684.33333333  749.11111111  759.11111111  653.33333333
817.11111111  705.22222222  689.44444444  712.33333333  659.
683.88888889  713.          740.44444444  692.22222222  677.33333333
681.44444444  640.          717.55555556  717.88888889  769.22222222
690.88888889  786.          774.66666667  799.44444444  743.44444444
789.88888889  673.66666667  685.66666667  709.88888889  645.55555556
846.11111111  792.77777778  702.22222222  749.44444444  678.55555556
707.55555556  665.77777778  643.55555556  671.44444444  795.66666667
627.22222222  684.55555556  708.44444444  829.66666667  719.        '''

dary = np.array([*map(float, d.split())])

dary -= np.average(dary)

step = np.hstack((np.ones(len(dary)), -1*np.ones(len(dary))))

dary_step = np.convolve(dary, step, mode='valid')

# get the peak of the convolution, its index

step_indx = np.argmax(dary_step)  # yes, cleaner than np.where(dary_step == dary_step.max())[0][0]

# plots

plt.plot(dary)

plt.plot(dary_step/10)

plt.plot((step_indx, step_indx), (dary_step[step_indx]/10, 0), 'r')

enter image description here

f5r5e5d
  • 3,656
  • 3
  • 14
  • 18
  • 6
    This is incredible. Do you know any resource available online or a book that covers methods such as these? I don't mean just convolution or signal processing methods but applications of signal processing to detect outliers, anomalies and regime changes, like the one you just described? – nitred Dec 28 '17 at 21:13
  • argmax might be preferred to`np.where(dary_step == dary_step.max())[0][0]` – Brandon Dube Dec 28 '17 at 21:16
  • conolution is a mainstay of EE, my background - the peak could be negative too depending on the relative polarities of the steps in the convolution - so to be general there would need to be more case testing – f5r5e5d Dec 28 '17 at 21:21
  • Why was the array `dary` transformed to be 0-mean? – datapug May 03 '21 at 11:14
  • 2
    You can also use `-2 * np.cumsum(dary)` instead of `np.convolve(dary, step, mode='valid')`. It's a bit faster. – Bill May 15 '21 at 18:21
  • @nitred (4yrs later) and others looking for a book: [Detection of Abrupt Changes - Theory and Application](https://people.irisa.fr/Michele.Basseville/kniga/) (can be downloaded for free). – djvg Nov 04 '21 at 15:13
  • @Bill: `np.cumsum()`, exactly. Less code. The `-2` is not necessary either, that's just an artefact of the chosen kernel scale. – djvg Nov 04 '21 at 15:23
  • 1
    @datapug: The zero-mean transformation is a trick to get a peak that is easy to find using `argmax()` (just see what happens without the transformation). Basically the convolution with a step kernel is a cumulative sum (in this case a scaled one), so the zero-mean transformation ensures that the cumulative sum ends in zero. Note that this approach to step detection only works reliably if you already know there is exactly one step. Also see e.g. [the DSP guide](https://www.dspguide.com/ch7/1.htm). – djvg Nov 04 '21 at 16:13
  • @djvg is the technique in this post described in Detection of Abrupt Changes? This technique makes sense, but I’m trying to understand the theory behind it. The simplest algorithms in the book depend on the the new mean being known ahead of time. – jdizzle Mar 19 '22 at 23:19
  • @jdizzle: Not sure if it's discussed in "Detection of Abrupt Changes", but for better understanding of convolution techniques I would refer to the [DSP Guide](https://www.dspguide.com/ch7/1.htm). – djvg Mar 20 '22 at 15:30
  • What if I have several steps in my data? When applying your approach I still get a mountain-shaped function but the max just results in the biggest peak. The other steps just result in a very slight change in slope. I'd have to change my convolution function right? But how? – Novorodnas Apr 25 '23 at 19:33
1

I'm interested in this myself. I'm not an expert but, as suggested in this answer, if you de-noise the signal first, it might help distinguish the true step from the noise. The total variation denoising algorithm in scikit-image will do the trick:

import numpy as np
from matplotlib import pyplot as plt
from skimage.restoration import denoise_tv_chambolle


data = '''594.          568.55555556  577.22222222  624.55555556  546.66666667
552.88888889  575.55555556  592.33333333  528.88888889  576.11111111
625.          574.22222222  556.33333333  567.66666667  576.66666667
591.66666667  566.33333333  567.33333333  547.44444444  631.11111111
555.66666667  548.66666667  579.44444444  546.88888889  597.55555556
519.88888889  582.33333333  618.88888889  574.55555556  547.44444444
593.11111111  565.66666667  544.66666667  562.66666667  554.11111111
543.88888889  602.33333333  609.77777778  550.55555556  561.88888889
719.33333333  784.44444444  711.22222222  843.66666667  691.33333333
690.11111111  684.33333333  749.11111111  759.11111111  653.33333333
817.11111111  705.22222222  689.44444444  712.33333333  659.
683.88888889  713.          740.44444444  692.22222222  677.33333333
681.44444444  640.          717.55555556  717.88888889  769.22222222
690.88888889  786.          774.66666667  799.44444444  743.44444444
789.88888889  673.66666667  685.66666667  709.88888889  645.55555556
846.11111111  792.77777778  702.22222222  749.44444444  678.55555556
707.55555556  665.77777778  643.55555556  671.44444444  795.66666667
627.22222222  684.55555556  708.44444444  829.66666667  719.        '''


x = np.array(data.split()).astype('float')
x_std = (x - x.mean()) / x.std()
x_denoise = denoise_tv_chambolle(x_std, weight=1)  # adjust the parameters
x_step = -2*np.cumsum(x_denoise)
step_indicator = x_step == x_step.max()

n = x.shape[0]
plt.subplot(211)
plt.plot(range(n), x_std, label='standardized')
plt.plot(range(n), x_denoise, label='denoised (TV)')
plt.legend()
plt.grid()
plt.subplot(212)
plt.step(range(n), step_indicator)
plt.show()

Output:

Matplotlib pyplot figure with 2 subplots showing noisy and denoised signal and the step indicator

Note:

One caveat with these methods, you must be sure there is only one step in the signal (and there must be one).

Bill
  • 10,323
  • 10
  • 62
  • 85
  • Thanks! This was extremely helpful for me, even though in my case there are multiple steps (with a span of different possible values for each step). After adding some logic for my specific case, I wonder what is the purpose of this line in the code: `x_std = (x - x.mean()) / x.std()`. I noticed that denoising the raw data as is doesn't work as well as on `x_std`. @Bill – Gamow Drop May 06 '22 at 10:43
  • 1
    Not sure if standardizing the data is necessary. As long as you pick your step-detection threshold appropriately. B.t.w. there is a new Python library for smoothing and differentiation algorithms, including TVR, if you are interested: [PyNumDiff](https://github.com/florisvb/PyNumDiff) – Bill May 06 '22 at 17:06