0

I'm using Datadog's statsd client to record the duration of a certain server response. I used to pass in quite a few number of custom tags when time-ing these responses. So I'm in the process of reducing the number of custom tags.

However, the problem is that when I reduce the number of tags passed in, there is extra latency of server response, which isn't intuitive because I'm passing in fewer tags and the implementation hasn't changed.

According to Datadog and Etsy (which originally released statsd), these methods that record these metrics aren't blocking. However, they must be using some extra threads to perform this.

What could be the issue? Are there possible any side effects associated with using this client?

sshh
  • 5,964
  • 4
  • 17
  • 20

2 Answers2

0

I can't speak specifically for the Java implementation, but in the CSharp client, the ability to send this data to Datadog is done to 127.0.0.1 via UDP port 8125. It's on the same thread as your executing code and not asynchronous. The whole effort by your process is finished once the UDP message is sent - it's fired and immediately forgotten.

The thread overhead you mention occurs in the separate Datadog agent process which is listening on the other end of UDP 8125, and has it's own thread pool and ability to buffer some data before sending up to Datadog's servers.

Do you have additional information that shows this behavior? Based on what I know, this doesn't sound like a side effect of the Datadog/StatsD stuff.

William Holroyd
  • 3,344
  • 1
  • 21
  • 25
  • I was actually able to find an answer here: ["How to graph percentiles in Datadog"](https://help.datadoghq.com/hc/en-us/articles/204588979-How-to-graph-percentiles-in-Datadog). – sshh Jan 16 '19 at 20:30
0

I found the answer on Datadog's help forum: "How to graph percentiles in Datadog".

  • Making a change to increase tag complexity (adding additional tags to be more specific) will lead to changes in the behavior of a rolled up metric visualization
    • EX: Whereas before the change METRIC_NAME.avg (without any tags) would be aggregating across all raw points (statsd takes all the raw datapoints, aggregates it and then ships over a single metric stream), adding a tag like region (US, EU) tag causes statsd to bin raw datapoints into two region bins, aggregate them, and ship over two streams. This means when graphing METRIC_NAME.avg AVG by * means an aggregate across the two streams rather than a single one

So the gist is that the latency itself didn't go up, but aggregating over multiple streams (where each stream corresponds to each custom tag) caused the graph to display a different shape.

sshh
  • 5,964
  • 4
  • 17
  • 20