2

I have a scenario in my RabbitMQ setup that I'm curious about how to solve. The diagram below illustrates it (exchanges and most queues removed for succinctness):

enter image description here

Scenario

  1. Producer creates message A(1), it is received by the top consumer, which begins processing the message.
  2. Producer creates message A(2), it is received by the bottom consumer (assuming both consumers are on a round-robin exchange).
  3. The bottom consumer publishes message B(2), which is put into Message B consumer's queue
  4. The poor slow top consumer finally finishes and emits its message B(1).

Problem

If we assume that B consumer cannot be made idempotent, how do we ensure the result of both B messages are applied in the correct order?

I had thought of using a timestamp that is applied to the initial publish of message A, and having the consumer maintain a timestamp of last change, rejecting any timestamps before that time, but that only works if each message causes the exact same kind of change and requires a lot of tracking.

Other ideas for how to approach this would be appreciated. Thanks!

Paul Kirby
  • 570
  • 4
  • 23
  • Your question seems related to [RabbitMQ wait for multiple queues to finish](http://stackoverflow.com/questions/13861459/rabbitmq-wait-for-multiple-queues-to-finish) – Nicolas Labrot Apr 09 '15 at 18:50

1 Answers1

2

I am not sure what is specific to RabbitMQ here, but the idea with timestamps sounds like a good start if you have a single producer.

The producer attaches a timestamp to the messages A, each message B take the same timestamp of its respective message A.

With your approach some messages would not be processed, eg, message B(1). If all messages should be processed by consumer B, but they should be processed in a deterministic order, then you can do a deterministic merge:

Consumer B is equipped with two queues, one queue for each consumer A. Consumer B always checks the top of both queues:

  • if both queues are non-empty, consumer B pops the message with the lowest timestamp.
  • if at least one queue is empty, the consumer B waits.

With this approach the order in which consumer B processes messages is given by the timestamps of the producer and no message is discarded. Assumptions are:

  • queues are FIFO
  • no process crashes
  • always the case that eventually each consumer A processes a message
  • consumer B can check the top of the queues in a non-blocking fashion
danyhow
  • 872
  • 7
  • 10
  • Thanks for this, very interesting idea. Unfortunately it would be very difficult to implement in our particular situation as we can't guarantee assumption 3 (eventually each consumer A produces a message) - the model above was slightly simplified and the number of consumer As can dynamically scale, so consumer B's queues would have to scale in tandem with that. – Paul Kirby Apr 10 '15 at 15:00
  • That makes things harder indeed. Some suggestions: (1) If the set of consumers A are fixed but they are sometimes silent (don't send messages), you could solve that by periodically heartbeating from the producer and forcing the consumers to downstream a control message just to fill its queue in consumer B. – danyhow Apr 10 '15 at 17:30
  • (2) If the set of consumers A changes, the approach of this answer would require you to keep a consistent membership set somewhere, eg, in the producer itself or in a ZooKeeper ensemble. The consumer B would have to be notified up to which timestamp what group of consumers A was active. Again a heartbeat from the producer down to the consumer B could carry such membership and timestamp information. (3) Regarding scaling queues in B: B could still have a single input queue and simply keep a set of internal buffers for messages dequeued from the input queue but that cannot yet be processed. – danyhow Apr 10 '15 at 17:31
  • In my opinion the only viable solution is to create an aggregator which will aggregates and reorders messages B(X). The aggregator should be backed by a datastore (memory and/or disk, eg. redis, rdbms)). To be able to aggregate and reorder messages, an identifier (UUID), an order and the count should be associate to messages A(X). The aggregator will release message B(X) to consumer B with the garantee of ordering – Nicolas Labrot Apr 10 '15 at 18:38