-1

I have a 15 min sliding window, and can aggregate at any given time over this data within this window. Due to memory constraints, I can't increase the size of the window. I still think I should be able to get aggregates (like trending items which is basically freq counter etc.) over a day/a week.

It doesn't have to be a very accurate count, just needs to filter out the top 3-5.

  1. Will running a cron job every 15 mins and putting it into 4 (15min) counters work?
  2. Can I get update some kind of a rolling counter over the aggregate?
  3. Is there any other method to do this?
  • 1
    What's the data? (A series of numeric values?) What "aggregates" do you want to compute? Some (like the sum or average) are easy to compute in O(1) space with a single pass, while others (like the median) are provably impossible to compute in O(1) space with a single pass. – j_random_hacker Feb 25 '16 at 14:14
  • Can you provide additional details? – Alessandro Cuttin Feb 25 '16 at 14:20

1 Answers1

0

My suggestion is an exponentially decaying moving average. Like is done for the Unix load average. (See http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ for an explanation.)

What you do is pick a constant 0 < k < 1 then update every 5 minutes as follows:

moving_average = k * average_over_last_5_min + (1-k) * moving_average

This will behave something like an average over the last 5/k minutes. So if you set k = 1/(24.0 * 60.0 / 5.0) = 0.00347222222222222 then you get roughly a daily moving average. Divide that by 7 and you get roughly a weekly moving average.

The averages won't be exact, but should work perfectly well to identify what things are trending recently.

btilly
  • 43,296
  • 3
  • 59
  • 88