0

I'm trying to pull some metrics into Graphite over a RabbitMQ exchange. I've got my publishers merrily publishing data to an exchange called metrics, and I've configured carbon.conf with the following:

ENABLE_AMQP = True
AMQP_HOST = hostname
AMQP_PORT = 5672
AMQP_VHOST = /vhost
AMQP_USER = user
AMQP_PASSWORD = password
AMQP_EXCHANGE = metrics
AMQP_METRIC_NAME_IN_BODY = True

The rMQ installation is a two-node cluster behind haproxy.

When this works, it works great. However, quite often, carbon throws the following issue:

02/05/2013 15:13:14 :: [console] Unhandled error in Deferred:
02/05/2013 15:13:14 :: [console] Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 421, in errback
    self._startRunCallbacks(fail)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 488, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1126, in gotResult
    _inlineCallbacks(r, g, deferred)
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1068, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/graphite/lib/carbon/amqp_listener.py", line 70, in connectionMade
    yield self.receive_loop()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1068, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/graphite/lib/carbon/amqp_listener.py", line 102, in receive_loop
    msg = yield queue.get()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/txamqp/queue.py", line 32, in _raiseIfClosed
    raise Closed()
txamqp.queue.Closed:

02/05/2013 15:13:14 :: [console] <twisted.internet.tcp.Connector instance at 0x2219f80> will retry in 1976 seconds
02/05/2013 15:13:14 :: [console] Stopping factory <carbon.amqp_listener.AMQPReconnectingFactory instance at 0x2214ab8>

Somehow, the connection got dropped. What's worse is that it wants to reconnect in half an hour!

How do I

  1. Find out why it's disconnecting?
  2. Massively reduce the reconnect time?

Software:

txAMQP==0.6.2
graphite 0.9.11
RabbitMQ 3.1.0
Haproxy 1.4.18
growse
  • 8,020
  • 13
  • 74
  • 115

1 Answers1

1

We experienced the same issue today. I'm not sure about #1 but I believe the second problem is that the reconnection delay is never reset in amqp_listener.py and should be done so before the protocol is built in buildProtocol. I submitted a pull request here: https://github.com/graphite-project/carbon/pull/102. Hope this helps.

Before the change (exceptions omitted):

console.log.2013_5_2:02/05/2013 17:11:14 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 17:11:16 :: will retry in 5 seconds console.log.2013_5_2:02/05/2013 17:41:18 :: will retry in 12 seconds console.log.2013_5_2:02/05/2013 18:11:22 :: will retry in 28 seconds console.log.2013_5_2:02/05/2013 18:41:26 :: will retry in 77 seconds console.log.2013_5_2:02/05/2013 19:11:32 :: will retry in 178 seconds console.log.2013_5_2:02/05/2013 19:41:39 :: will retry in 455 seconds console.log.2013_5_2:02/05/2013 20:11:48 :: will retry in 967 seconds console.log.2013_5_2:02/05/2013 20:42:01 :: will retry in 1831 seconds console.log.2013_5_2:02/05/2013 21:22:13 :: will retry in 3375 seconds

After the change (exceptions omitted):

console.log.2013_5_2:02/05/2013 21:42:21 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 21:42:24 :: will retry in 9 seconds console.log.2013_5_2:02/05/2013 22:12:18 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 22:12:21 :: will retry in 9 seconds console.log.2013_5_2:02/05/2013 22:42:32 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 22:42:35 :: will retry in 7 seconds console.log.2013_5_2:02/05/2013 23:12:29 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 23:12:32 :: will retry in 5 seconds console.log.2013_5_2:02/05/2013 23:42:38 :: will retry in 2 seconds console.log.2013_5_2:02/05/2013 23:42:41 :: will retry in 6 seconds

gdeangel
  • 11
  • 2