Spotting the point at which data stabilises using standard Bash utilities

Question

I have a set of data (two columns of CSV numbers) which varies quite a bit initially and then stabilises around a certain number. I'm trying to spot the point at which the graph first seems to stabilise in an automated way using standard Bash utilities. In an ASCII graph, the data may look like something such as this:

y
^                                     ___ stabilised
| .                                  |
|  .                                 |
|    .    .                          |
|     .  . .    .       .            ▾
|      .    .  . .   . . .  . . . . . . . . . . . . . . . .
|             .   . .              .
------------------------------------------------------------> x

Note that the data do not reach a constant value, but fluctuate slightly around the stable value. Is there some way I may spot the first point at which the graph seems to become stable using standard Bash utilities?

+1 for ascii-graph, and nicely framed problem, but not really in the scope of Stackoverflow. we expect to see some code that we can help fix ;-). This level of discussion may be more appropriate for http://programmers.stackexchange.com/. (but only becuase I have seen similar questions directed there). Good luck. — shellter, Feb 27 '13 at 14:49

score 1 · Answer 1 · answered Feb 27 '13 at 13:20

1

What you want is to detect statistical stationarity, which is a pretty tough problem and research papers [1], [2], [3] are written about it. First you will need to decide on the algorithm that actually is able do detect stationarity before you even begin to consider how you would implement it using any programming language (be it Unix utilities, Python with numpy/scipy, or whatever you choose). Perhaps a good book on time-series analysis will help you here.

answered Feb 27 '13 at 13:20

Michael Wild

24,977
3
43
43

Thanks for that. I'll read through the links you posted. Indeed, the general problem is certainly non trivial. For my current problem, I suspect that I would be happy to use a parameter that is derived manually from the data. One approach may be to cycle backwards through the data, searching for the first point that lies outside the range defined by the parameter multiplied by the standard deviation of the points considered so far. What do you think? – d3pd Feb 27 '13 at 13:31
That could work, if you *know* that you have indeed reached statistical stationarity. BTW, I included the links only to illustrate that this is a difficult problem. Your proposal also potentially fails if there is a *creeping* trend in your data. – Michael Wild Feb 27 '13 at 14:24

Spotting the point at which data stabilises using standard Bash utilities

1 Answers1