0

In our ESB project, we have a lot of routes reading files with file2 or ftp protocol for further processing. Important to notice, that the files we read locally (file2 protocol) are mounted network shares via different protocols (NFS, SMB).

Now, we are facing issues with race conditions. Both servers read the file and process it. We have reduced the possibility of that by using the preMove option, but from time to time the duplicate reading still occurs when both servers poll at the same millisecond. According to the documentation, an idempotentRepository together with readLock=idempotent could help, for example with HazelCast.

However, I'm wondering if this is a suitable solution for my issue as I don't really know if it will work in all cases. It is within milliseconds that both servers read the file, so the information that one server has already processed the file need to be available in the HazelCast grid at the point in time when the second server tries to read. Is that possible? What happens if there are minimal latencies (e.g. network related)?

In addition to that, the setting readLock=idempotent is only available for file2 but not for ftp. How to solve that issue there?

Again: The issue is not preventing dublicate files in general, it is solely about preventing the race condition.

mcode
  • 534
  • 4
  • 18

1 Answers1

2

AFAIK the idempotent repository should prevent in your case that both consumers read the same file.

The latency between detection of the file and the entry in hazelcast is not relevant because the file consumers do not enter what they read. Instead they both ask the repository for an exclusive read-lock. The first one wins, the second one is denied, so it continues to the next file.

If you want to minimize the potential of conflicts between the consumers you can turn on shuffle=true to randomize the ordering of files to consume.

For the problem with the missing readLock=idempotent on the ftp consumer: you could perhaps build a separate transfer-route with only 1 consumer that downloads the files. Then your file-consumer route can process them idempotent.

burki
  • 6,741
  • 1
  • 15
  • 31
  • Thanks, sounds good. I still have trouble imaging how hazelcast makes sure only one server adds the read-lock. Can I compare it with a database table where two servers try to add a record with the same primary key? Only one server will success, the other one gets a unique constraint violation exception - even if they try to insert in the same millisecond? – mcode Jul 09 '18 at 06:59
  • I don't know the exact internals, but [this page](http://camel.apache.org/hazelcast-idempotent-repository-tutorial.html) with an example says `This implementation guarantees cluster wide that no message will be processed twice.` If you want to know the exact workings, look into the code. A starting point would be [HazelcastIdempotentRepository.java](https://github.com/apache/camel/blob/master/components/camel-hazelcast/src/main/java/org/apache/camel/processor/idempotent/hazelcast/HazelcastIdempotentRepository.java) – burki Jul 09 '18 at 08:23