1

I'm working on an alerting solution that uses Logstash to stream AWS CloudFront logs from an S3 bucket into Graphite after doing some minor processing. Since multiple events with the same timestamp can occur (multiple events within a second), I elected to use Carbon Aggregator to count these events per second.

The problem I'm facing is that the aggregated whisper database seems to be dropping data. The normal whisper file sees all of it, but of course it cannot account for more than 1 event per second.

I'm running this setup in docker on an EC2 instance, which isn't hitting any sort of limit (CPU, Mem, Network, Disk).

I've checked every log I could find in the docker instances and checked docker logs, however nothing jumps out.

I've set the logstash output to display the lines on stdout (not missing any) and to send them to graphite on port 2023, which is set to be the line-by-line receiver for Carbon Aggregator:

[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023

aggregation-rules.conf is set to a very simple count per second:

test.<user>.total1s (1) = count test.<user>.total

storage-schemas.conf:

[default]
pattern = .*
retentions = 1s:24h

Happy to share more of my configuration as you request it.

I've hit a brick wall with this, I've been trying so many different things but I'm not able to see all data in the aggregated whisper db.

Any help is very much appreciated.

1 Answers1

0

Carbon aggregator isn't designed to do what you are trying to do. For that use-case you'd want to use statsd to count the events per second.

https://github.com/etsy/statsd/blob/master/docs/metric_types.md#counting

Carbon aggregator is meant to aggregate across different series, for each point that it sees on the input it quantizes it to a timestamp before any aggregation happens, so you are still only going to get a single value per second with aggregator. statsd will take any number of counter increments and total them up each interval.

AussieDan
  • 2,116
  • 15
  • 11
  • Thank you for your answer. I'm not sure `statsd` is the right tool in my case though as I need to be able to inject the log's original timestamp into the datapoint. Looking at the [Logstash Statsd Output](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-statsd.html) there doesn't seem to be a way to do so like with the [Logstash Graphite Output](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-graphite.html#plugins-outputs-graphite-timestamp_field) – Alexander Phoenix Jan 23 '19 at 09:55
  • Ah, yes in that case statsd won't work for you as it is intended to be used real-time. You may be able to use https://github.com/graphite-ng/carbon-relay-ng as it appears that it would aggregate multiple readings for the same metric during the same interval in the way you're expecting. – AussieDan Jan 28 '19 at 18:42