0

Is there a way in Elasticsearch (5x) to filter for values that are "unusual"

We have some time series data we run aggregation queries against, but occasionally due to bad data, we get some abnormal values. Is there a way to filter those out? Anything really far away from the mean basically.

The problem is that we can hard code any specific values, because each graph has it's own "baseline" average.

K2xL
  • 9,730
  • 18
  • 64
  • 101
  • would aggregations and percentiles not work? https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html – diginoise Jun 16 '17 at 16:06
  • @diginoise would that work in following scenario? i'm doing an histogram aggregation for average latency per system per day - but for each system i want to ignore the outliers. – K2xL Jun 17 '17 at 12:37
  • do you mean that you create one histogram per day per system? Also what do you mean by average latency? It's a histogram of *all* latencies for a given system in a given day right? Which way does your bad data add outliers at (i.e. lower percentiles, or at the top) ? – diginoise Jun 19 '17 at 15:59
  • So each system reports latencies, usually they are between 50 and 200. Sometimes we get bad values, either something like 5 or something like 3011 due to bugs in the reporting system. Would like to be able to filter these out when making histogram aggregations – K2xL Jun 20 '17 at 19:34

0 Answers0