3

OK, so you have some historic data in the form of [say] an array of integers. This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample.

The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc.

How would you go about identifying the overall trend for the two-year period, i.e.: iron out the peaks and troughs in the data?

Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain.

I'm not after a bit of code as such, more of a description of how you'd approach this problem...

Thanks in advance!

Simon Catlin
  • 2,141
  • 1
  • 13
  • 15

2 Answers2

8

You'll get many different answers, and the one you choose really depends on more specific requirements you may have. Examples:

  1. Low-pass filter, or any other spectral analysis technique, and use the low frequencies to determine trend.

  2. Linear regression (time/value) to find "r" (the correlation between time and the value).

  3. Moving average of last "n" samples. If "n" is large enough this is my favorite as many times this is sufficient, and is very easy to code. It's a sort of approximation to #1 above.

I'm sure they'll be others.

Nitzan Shaked
  • 13,460
  • 5
  • 45
  • 54
  • Thank you, Nitzan. Shame I can't accept the answer for two posts. Vote++. – Simon Catlin Sep 07 '13 at 17:11
  • What if all I want is to know whether there is an underlying memory leak? I can iteratively calculate the moving average on the whole series but I cannot do moving averages. My guess is that it should be enough to detect a leak. – uuu777 Nov 23 '22 at 01:34
  • Sorry not "moving average" just average on the whole series – uuu777 Nov 23 '22 at 01:59
2

If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression, http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
  • This is exactly what I was looking for - a method to do what you'd do visually when trying to define a y=n.x style expression for a data set. Thank you. – Simon Catlin Sep 07 '13 at 17:12