5

In a web application, I get a trigger every time an event occurs. I want to detect 'violent' frequency peaks, which probably translate into abnormal behaviour.

I can think of two naive ways of achieving that:

  • Fixed threshold - "If more than 500 events occur within a minute, sth's probably wrong". This method cannot handle smooth threshold breaches or steadily increasing traffic, unless the application can adjust the threshold periodically.

  • Window-related heuristic - Divide the window into N equal (?) intervals. While N>0, calculate the frequency of events happened in [now-(N*interval_length), now]. Save it in a list. Decrease N by 1. Repeat. Detect list outliers. If there is an outlier larger than the mean frequency of [now-window_length, now], sth's probably wrong."

I'd like to know if there is instead a common/standard solution for this problem or if you can think of anything more efficient or elegant.

Thank you in advance.

EDIT -- Another suggestion

A friend of mine suggested aberrant behaviour detection with Holt-Winters forecasting. You can find more information about this methodology in the links below:

http://www.hpl.hp.com/news/events/csc/2005/jake_slides.pdf

http://www.usenix.org/events/lisa00/full_papers/brutlag/brutlag_html/

sawidis
  • 201
  • 3
  • 5

3 Answers3

1

I am not expert. What I would do:

Let's say you keep only the last n results and x_n is the last sample (time difference from the previous event).

α_n x_n + α_{n-1}/2 x_{n-1} + ... + α_{1} 2^{-n} x_1 = T

If the difference T - T_{previous}, where T_{previous} is the previous value of T, surpass a limit, do something.

If your values of x_i are binary, you can nice tricks with shift and or operations, if speed is a matter.

Dimitris Leventeas
  • 1,622
  • 2
  • 16
  • 29
  • Tx for answering =) Some questions here.. a) The more recent the event, the more weight it gets, right? b) What does 'a' denote? Couldn't I just adjust T so as to avoid the 'n' multiplications? – sawidis Sep 02 '11 at 05:43
  • Good observations. I had forgotten the index at `α_i`. In case you want a special weight. It could be `a_i = 1` for each `i`. If you mean to shift T to the right and add the new value of x_n, you are right. – Dimitris Leventeas Sep 02 '11 at 10:14
  • 1
    And yes, more recent ==> more important. – Dimitris Leventeas Sep 02 '11 at 10:17
0

just get a simple average over the values of the last X minutes (keep the values)

compare each new incoming value with the average:

  • if difference is more than Y% then its an outlier, alert.
  • If less, add this to the average and remove the first one, fifo style.

If you think it can be tricked with ' steadily increasing traffic', make X sufficiently large.

  • 1
    I think we care more about signals that are more frequent now than the immediately previous signals, than those that happened before `a while`. Where `a while` is the length of the window. – Dimitris Leventeas Sep 02 '11 at 10:16
0

You can compute an exponentially weighted floating-mean estimator, and compare it with its previous value. An abrupt increase is probably what you are trying to detect, but combined with a certain minimum threshold (so e.g. 0 to 1 is not significant).

But say the current floating mean jumps up from 100 to 200, that probably is the kind of events you want to detect.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194