1

I have a web application that sends an email on certain action. So, for that, it saves the request in database, and then one email worker thread picks up the request and sends the email. Email worker watches the database for changes.

Now I want to have the same web application running behind load balancer sharing the same database. Now the problem is when I'll create an email request in database, there is a possibility that the email worker running inside the similar web application on different machine behind load balancer might see the database entry of email at the same time and this will result in same email being sent multiple times.

So, Is there any way to prevent this situation other than explicitly locking a table?

I've read this question Distributing java threads over multiple servers? but don't know whether the solutions provided there will suffice my need. Terracotta seems to be the solution, but I think it will need explicit synchronization to be added to the code, don't know.

Any knowledge on this will be helpful.

HelloWorld
  • 123
  • 1
  • 12

2 Answers2

0

A simple, low-tech solution would be to have the email sending worker running just once.

This can be achieved either by extracting the worker to a separate application, which is triggered for example by cron, or by making it configurable, whether an instance of your web application has the email worker activated. In the latter case it is up to you to make sure that only one of the load balanced instances has the email worker active

The downside would be that E-Mail sending would not be redundant, i.e. if the host on which email sending is active is down, no one is sending any emails.

If you need redundancy you'd have to rely on some distributed middleware, as in the question you linked above, or implement a synchronization mechanism yourself.

For an own implementation you could look at optimistic locking using a version number field/column on the email request, as supported by Hibernate/JPA (see https://stackoverflow.com/a/19456821/981913). The algorithm would be something along the following lines:

  1. Worker wakes up, finds email request in the DB, and version column = 0
  2. Worker updates the request with version = 1. Update only succeeds if version was 0 before, meaning no other worker updated it to 1 since the request was read at step 1.
  3. if update from step 2 was successful, worker can safely assume no other worker will process this reques, and go ahead and send the email
Andrei Socaciu
  • 1,198
  • 9
  • 20
0

Use SQL transactions to lock a row for processing.

Your email row should have 2 columns added, TIMESTAMP type: processed_time, send_time but default to NULL

  1. Begin transaction
  2. select 1 item where processed_time is NULL and send_time is NULL and use highest level of serialization for that database (see something like In SQL Server, how can I lock a single row in a way similar to Oracle's "SELECT FOR UPDATE WAIT"?), every DB/ORM has a way of doing this, postres uses SELECT FOR UPDATE, etc.
  3. update row and set processed_time to NOW()
  4. commit (but don't close transaction, if email fails you may need to handle that or rollback 4a. At this point you have claimed that row for yourself so another thread would not get that row at step 2 and needs to be done asap
  5. send email
  6. update row and set send_time to NOW()
  7. End transaction

You may need to fine tune this process based on DB/ORM you are using but the general idea holds.

AlexC
  • 1,395
  • 14
  • 26