0

I'm writing a Java app to update a relational database (currently H2). I have a process which does the following:

  1. Query an external system for certain data items
  2. Check whether or not those items have already been imported into our system by checking an import log table. If not:
  3. Import the new data items into various tables.
  4. Write a new record into the import log table.

This process might be run concurrently in different threads. I'd like to avoid the problem where two threads might both check the import log, find nothing there, and then both attempt to insert the data items.

What might be a good approach? I've been considering:

  1. Using SERIALIZABLE transaction isolation.
  2. Relying on a unique index constraint in the import log to error and rollback one of the transactions.
  3. Confining the process to a single thread in the Java app.

None of the above seem very appealing, for various reasons -- is there another approach that might work better?

everton
  • 7,579
  • 2
  • 29
  • 42
user3364825
  • 1,541
  • 1
  • 15
  • 23

2 Answers2

0

I'd invert your process. Instead of polling the source for changes, I'd make your source to write the changes into a queue (JMS came to my mind, but it could be any queue). This will make your job easier and probably with better performance.

To do that in your external system is just as simple as adding some triggers in the DB or any listener in the persistence layer (if there is any), just like any audit procedure would do.

Of course, this option does only make sense if you have control over your source data.

Why do you need many threads to do the synchronization?

Leo
  • 6,480
  • 4
  • 37
  • 52
  • Thanks -- in this case I don't have any control of the external system, unfortunately. – user3364825 Feb 28 '14 at 13:30
  • The concurrency occurs at the moment because 1) There's a periodically scheduled poller, and 2) a user can also trigger the import manually. Instead, I could arrange for both to post to a queue for a single worker thread to process (option #3 in the original question) – user3364825 Feb 28 '14 at 13:34
  • In this case, I think you have basically 2 options here. [1] try to deal with the data writing cuncurrently (queue, semaphore, serializable transaction, whatever) if you need the data immediately available at the cost of synchronizing. With this option, I'd go for your option #3 for the sake of simplicity. Or you can just not care about the data write and process this input in a batch process later at sometime (if you don't need the data immediately available), which may be easier or not, depending on the nature of your data. – Leo Feb 28 '14 at 14:18
0

The SERIALIZABLE transaction isolation certainly is the most certain way to achieve your goal but it could mean that performance will suffer.

There is one option you have not considered and that is to build your own semaphore.

You could create a static ConcurrentHashMap of items currently being processed and (at the start of each insert process - put a record and when done delete it.

Then each Thread process could consult this semaphore before starting inserts.

user3360944
  • 538
  • 4
  • 11