0

I have a situation where I am processing events that are related to specific sources. Each source has a key or ID, which I can use as the hash. Events from each source have to be processed in order, but events from different sources can be parallelized, to achieve horizontal scalability. There will be hundreds of source keys.

I am planning to set the key as part of the routing key when submitting messages to RabbitMQ, and then use the consistent-hash-exchange so that events from the same source are routed to the same queue. I was then thinking of dynamically binding private queues from consumers, with a TTL (so that they are gracefully removed if a consumer is down). At the beginning I will just have 2 or 3 consumers for redundancy, but if I want to scale up due to an increased number of messages, I can just start another consumer.

My question is what happens if a consumer is down and there are messages in its queue? Ideally I would want the messages in the queue to be rerouted back to the exchange, with the consistent-hash-exchange routing them to a different queue (since the original queue would be no longer there).

The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.

Does my approach make sense? How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?

Note: I know there is even a more subtle race condition if during the process of routing dead lettered messages to the exchange new messages come that were originally routed to the expired queue, which will now be routed to a different consumer, thus ordering will be broken at that specific instance.

jbx
  • 21,365
  • 18
  • 90
  • 144

1 Answers1

0

There are more then one questions to be answered here, I'll try to go in the same order.

My question is what happens if a consumer is down and there are messages in its queue?
Outside of the context (rest of the question) - messages stay in the queue until they are ACKed or their TTL expires.

The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.
It does say ...The TTL for the message expires..., so basically if the message is not ACKed within given TTL, it get's to DLX. For the queue TTL, check this link - it's basically an "expiry time" for the queue. Additionally, if the queue get's deleted, the messages are gone (when not taking into account any mirroring of course).

Now for the "does it makes sense" part. For the messages from the different sources, I think it's clear - process as much as you can in parallel and that's it. There are no collisions (well usually no) there.

How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?
For sequential processing, basically you need exactly one consumer that does one source. Now for monitoring this consumer maybe add a watchdog to start it again if it crashes, or restart it if hangs etc. Maybe it would also make sense to use get instead of consume (amqp) method. I can't really recommend or not recommend this approach, because (for me at least) it's quite use case specific (performance, how often is there a new message etc), but I would say that in that way it's easier to achieve a "more synchronous" behavior.

And for sure (now referring to what you wrote in the note) you should try and avoid DLX-ing messages (higher TTL etc) if you really want to keep the original order of the sequence (said it redundantly on purpose :) )

cantSleepNow
  • 9,691
  • 5
  • 31
  • 42
  • Thanks for your reply. With regards to frequency it is quite a high number of messages on peak times. I might be over-engineering this a bit. What I am looking for is a sort of automatic re-routing of the messages if the consumer just disappears. I guess I could have the message TTL shorter than the queue TTL, so that if the consumer gets disconnected the messages are DLXed first, prior to the queue TTL expiring and getting deleted. Yes, there is still the issue of ordering DLXed messages and the race condition with new messages from the same source key. – jbx Oct 03 '16 at 14:21
  • You are welcome. As I said, I would solve this "on the consuming side" with a watchdog for consumer. You want want consumer in order to keep processing sequential, but also you want it running :) – cantSleepNow Oct 03 '16 at 14:23
  • Yeah, my worry is more what happens if the machine running the consumer dies and more than a few minutes are needed to recover. To be fair, if the queue has a relatively short TTL (for example 30seconds) and the consumer doesn't reconnect, then there will be potentially 30 seconds worth of new messages. The hyphothesis I have on the DLXing doesn't work, since a message could come in at the last second before the queue TTL expires. Alternatively I could delete the queue immediately after client disconnection, so that only a few messages are lost... not ideal to lose messages :( – jbx Oct 03 '16 at 14:34