Best way to include aggregated document counts as part of percolation queries

Question

Imagine that I have a stream of events, each of which with a particular event type and scoped to a particular user/account

Users can set up alerts of the form

Send alert when event A has occurred 3 times within the last year/month/day etc.

I'd expect to receive 100s of such events a second

I was thinking that I would have a separate index for each day

I was also thinking about whether pre-aggregating counts somehow would be necessary, as doing a separate aggregation/count query for each incoming event seems excessive and not scalable, but maybe it's not a problem?

What would be the best approach for this problem?

score 0 · Answer 1 · answered Jul 03 '18 at 13:31

One approach that comes to my mind is:

Having a percolate query for each user with their settings. Allowing them to add events with the word "error" to the level error for example.
Each event is indexed in one per-client index and maybe if you have a lot of events per client, should be useful to have an per-client-level index, like events_clientId_alarm.

Then the mapping of an event should be something like:

{
  "indexed_at": datetime,
  "level": keyword [fatal/error/debug/...],
  "log": string
}

Then you will have an stream of events coming to percolate, once event is percolated you will know where to store the event.

You can then kibana/grafana,etc .. approach to monitor your indices data and make alarms if there's like 4 event with level alarms in the last 5 minutes.

At the worst case you will have one index with more or less 8640000 * 365 documents (If you have only one user with 100/events by second), this is a huge index, but could be managed correctly by ElasticSearch (adding enough shards to make your searchs/aggregations by log-level and dates).

The most important thing here is know how your data will increase in time, due Elasticsearch don't allow you to add more shards in each index. Then you must need to wonder how each customer data will increase over time and guess how many shards you will need to have it all smoothly running.

NOTE: Depending on your deals with your customers, if they want whole history on their events-data or something like that. You can store one index per year per client in order to allow you delete old data if required and allowed.

Hope it helps, I did a similar project and I'd done a similar approach to accomplish it.

Best way to include aggregated document counts as part of percolation queries

1 Answers1