9

I am trying to understand what is the right pattern to deal with RabbitMQ deliveries in the context of distributed database transaction.

To make this simple, I will illustrate my ideas in pseudocode, but I'm in fact using Spring AMQP to implement these ideas.

Anything like

void foo(message) {
   processMessageInDatabaseTransaction(message);
   sendMessageToRabbitMQ(message);
}

Where by the time we reach sendMessageToRabbitMQ() the processMessageInDatabaseTransaction() has successfully committed its changes to the database, or an exception has been thrown before reaching the message sending code.

I know that for the sendMessageToRabbitMQ() I can use Rabbit transactions or publisher confirms to guarantee that Rabbit got my message.

My interest is understanding what should happen when things go south, i.e. when the database transaction succeeded, but the confirmation does not arrive after certain amount of time (with publisher confirms) or the Rabbit transaction fails to commit (with Rabbit transaction).

Once that happens, what is the right pattern to guarantee delivery of my message?

Of course, having developed idempotent consumers, I have considered that I could retry the sending of the messages until Rabbit confirms success:

void foo(message) {
   processMessageInDatabaseTransaction(message);
   retryUntilSuccessFull {
      sendMessagesToRabbitMQ(message);
   }
}

But this pattern has a couple of drawbacks I dislike, first, if the failure is prolonged, my threads will start to block here and my system will eventually become unresponsive. Second, what happens if my system crashes or shuts down? I will never deliver these messages then since they will be lost.

So, I thought, well, I will have to write my messages to the database first, in pending status, and then publish my pending messages from there:

void foo(message) {
   //transaction commits leaving message in pending status
   processMessageInDatabaseTransaction(message);
}

@Poller(every="10 seconds")
void bar() {
   for(message in readPendingMessagesFromDbStore()) {
      sendPendingMessageToRabbitMQ(message);
      if(confirmed) {
          acknowledgeMessageInDatabase(message); 
      }
   }
}

Possibly sending the messages multiple times if I fail to acknowledge the message in my database.

But now I have introduced other problems:

  • The need to do I/O from the database to publish a message that 99% time would have successfully being published immediately without having to check the database.
  • The difficulty of making the poller closer to real time delivery since now I have added latency to the publication of the messages.
  • And perhaps other complications like guarantee delivery of events in order, poller executions stepping into one another, multiple pollers, etc.

And then I thought well, I could make this a bit more complicated like, I can publish from the database until I catch up with the live stream of events and then publish real time, i.e. maintain a buffer of size b (circular buffer) as I read based on pages check if that message is in buffer. If so then switch to live subscription.

To this point I realized that how to do this right is not exactly evident and so I concluded that I need to learn what are the right patterns to solve this problem.

So, does anyone has suggestions on what is the right ways to do this correctly?

Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205

2 Answers2

2

When Rabbit fails to receive a message (for whatever reason, but in my experience only because the service is down or unavailable) you should be in a position to catch an error. At this point, you can make a record of that - and any subsequent - failed attempt in order to retry when Rabbit becomes available again. The quickest way of doing this is just logging the message details to file, and iterating over to re-send when appropriate.

As long as you have that file, you've not lost your messages.

Once messages are inside Rabbit, and you have faith in the rest of the architecture, it should be safe to assume that messages will end up where they are supposed to be, and that no further persistence work needs doing at your end.

HomerPlata
  • 1,687
  • 5
  • 22
  • 39
  • I can see you point, @HomerPlata. One question though, now you have pending message to be sent from your file (due to previous failure) and also new events arriving from the runtime stream of events. So you need first to process the pending messages from your file before you deal with the live stream of events, how do you deal with that? Are you suggesting these file processing feature is entirely separate from the current stream of events? – Edwin Dalorzo Feb 21 '17 at 16:42
  • If you just keep appending the failed messages to a log file until the next message eventually gets accepted by Rabbit (indicating you're good to go again), you can then iterate through the log lines and re-add to the message queue, deleting the log file once it's been used. You might run into a bit of a concurrency issue if Rabbit goes down again while you're iterating through the log file and you need to add more failures to it, but it's not impossible to solve. – HomerPlata Feb 21 '17 at 17:06
  • Actually, let me improve on that: If you use a single file per failed message (just serialise them individually) in a specific folder, you can iterate through all the files in that folder without any concurrency issues, adding new "failure" files even if you're in the process of resending. – HomerPlata Feb 22 '17 at 09:26
2

While RabbitMQ cannot participate in a truly global (XA) transaction, you can use Spring Transaction management to synchronize the Database transaction with the Rabbit transaction, such that if either update fails, both transactions will be rolled back. There is a (very) small timing hole where one might commit but not the other so you do need to deal with that possibility.

See Dave Syer's Javaworld Article for more details.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks for the link, @Gary. Precisely "dealing with that possibility" is what I'm most interested in discovering with my question here. I have indeed played with Spring AMPQ and I have seen how the post database commit code deals with committing the Rabbit transaction, but the race condition there is what worries. For example in implementing CQRS/ES I must be completely sure I will never lose an event. How do you think one should deal with that possibility you mentioned in your answer? – Edwin Dalorzo Feb 21 '17 at 16:50
  • You will never lose an event if the DB commits before the rabbit send commits. If the server crashes between the two commits, the rabbit message will be redelivered (while the DB has already committed). You just need to be sure the transaction managers are configured to commit in that order. See `ChainedTransactionManager` in spring-data-commons. – Gary Russell Feb 21 '17 at 16:58
  • That discussion is if you are using rabbit as the event source. If the database is the event source, you would need them to commit the other way around. – Gary Russell Feb 21 '17 at 17:01
  • If the event comes from a listener container, you don't need a chained TxManager. Just make the container transactional, and the rabbit TX will commit last. Make sure you start the DB transaction at `foo()` so the sends will be done within the scope of both transactions. Make sure the rabbit template is also marked `channelTransacted` so its sends operate within the same Rabbit transaction. – Gary Russell Feb 21 '17 at 17:07
  • I think I am starting to understand it, in the ChainedtransactionManager messages won't be delivered until the database transaction commits, and it if it rolls back the entire transaction is retried and messages are possibly republished, which is acceptable give the constraints of delivery at least once that Rabbit presupposes. – Edwin Dalorzo Feb 21 '17 at 17:19
  • OK, I gave it a shot to the `ChainedTransaction` manager and it worked like charm. I can clearly see that I will need to make my database transaction idempotent, because if the Rabbit transaction fails I will have to repeat the entire thing. I think it still stands that using Rabbit transactions will be less efficient than publishing returns and that retrying transactions if not done efficiently might make my application irresponsive, but I think this works pretty well. That link you shared is pretty good, Gary. Thanks! – Edwin Dalorzo Feb 22 '17 at 05:06
  • 1
    Yes, transactions are considered to be slow but publisher confirms only really help if you send a bunch of messages and __then__ wait for the confirms. I haven't done any testing, but I suspect that sending a single message and waiting for its confirm will not be a lot different than blocking on the commit (and you lose the transaction semantics). – Gary Russell Feb 22 '17 at 14:02