20% of lost values after statsd

Question

I need to supervise an real time application. This application receives 60 connections per seconds and for each I use 53 metrics.

So my simulation client sent 3180 metrics personds. I need the lower, upper, average, median and the count_ps values. Thats why I use the "timing" type.

When I look the count_ps at the end of statsd for one metrics, i have only 40 values and not 60. I dont find information on statsd's capacity. Maybe I overload it ^^

So could you help me, what are my options ?

I can't reduce the nomber of metrics but i don't need all informations provided by the "timing" type. Can I limit the "timing" ?

Thank you !

my configuration :

1) cat storage-schemas.conf

# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
#  [name]
#  pattern = regex
#  retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...

# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 60:90d

[stats]
pattern = ^application.*
retentions = 60s:7d

2) cat dConfig.js

{
  graphitePort: 2003
, graphiteHost: "127.0.0.1"
, port: 8125
, backends: [ "./backends/graphite", "./backends/console" ]
, flushInterval: 60000
, debug: true
, graphite: { legacyNamespace: false, globalPrefix: "", prefixGauge: "", prefixCounter: "", prefixTimer: "", prefixSet: ""}
}

3) cat storage-aggregation.conf

# Aggregation methods for whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds
#
#  [name]
#  pattern = <regex>
#  xFilesFactor = <float between 0 and 1>
#  aggregationMethod = <average|sum|last|max|min>
#
#  name: Arbitrary unique name for the rule
#  pattern: Regex pattern to match against the metric name
#  xFilesFactor: Ratio of valid data points required for aggregation to the next                                                                              retention to occur
#  aggregationMethod: function to apply to data points for aggregation
#
[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3

4) Client :

#!/usr/bin/env python
import time
import random
import statsd
import math

c = statsd.StatsClient('localhost',8125)
k = 0
nbData = 60
pause = 1

while True :
      print k
      k += pause
      tps1 = time.clock()
      for j in range (nbData):
                digit = j%10 + k*10 + math.sin(j/500)
                c.timing('TPS.global', digit)
                c.timing('TPS.interne', digit)
                c.timing('TPS.externe', digit)
                for i in range(5):
                        c.timing('TPS.a.'+str(i), digit)
                        c.timing('TPS.b.'+str(i), digit)
                        c.timing('TPS.c.'+str(i), digit)
                        c.timing('TPS.d.'+str(i), digit)
                        c.timing('TPS.e.'+str(i), digit)
                        c.timing('CR.a.'+str(i), digit)
                        c.timing('CR.b.'+str(i), digit)
                        c.timing('CR.c.'+str(i), digit)
                        c.timing('CR.d.'+str(i), digit)
                        c.timing('CR.e.'+str(i), digit)
      tps2 = time.clock()
      print 'temps = ' + str(tps2 - tps1)
      if k >= 60:
          k = 0
      if pause-tps2 + tps1 < 1:
         time.sleep(pause-tps2 + tps1)

Edit : add client code

score 0 · Answer 1 · answered Jul 01 '13 at 12:30

Without more context it's hard to say, what could be going on. Do you use sampling when sending data to StatsD? What hardware are you running StatsD on? Was your simulation all on localhost? Did you run it on a lossy connection?

At the moment there is no way of limiting timing metrics to only certain types.

Sorry to not be of more immediate help. If your problems persist, consider dropping into #statsd on Freenode IRC and ask there.

score 0 · Answer 2 · answered Jul 05 '13 at 16:08

0

What is your CARBON_METRIC_INTERVAL set to? I suspect it needs to match the StatsD flushInterval.

answered Jul 05 '13 at 16:08

Casey Watson

51,574
10
32
30

20% of lost values after statsd

2 Answers2