Issue: Iterating through files in a directory and scanning them for findings. ExecutorService used to create a thread-pool with fixed # of threads and invoking submit method like this:
final List<Future<List<ObjectWithResults>>> futures = Files.walk(baseDirObj) .map(baseDirObj::relativize)
.filter(pathMatcher::matches)
.filter(filePath -> isScannableFile(baseDirObj, filePathObj))
.map(filePathObj -> executorService.submit(() -> scanFileMethod(baseDirObj, filePathObj, resultMetricsObj, countDownLatchObj)))
.collect(ImmutableList.toImmutableList())
the scanFile method calls 3 concurrent scans that return a list of results. These results are added using:
resultsListObj.addAll(scanMethod1)
resultsListObj.addAll(scanMethod2)
resultsListObj.addAll(scanMethod3)
followed by:
countDownLatch.countDown()
In the method that calls executorService.submit()
when iteratively walking through files, I call:
boolean completed = countDownLatch.awaitTermination(200, TimeUnit.MILLISECONDS);
if(isDone)
executorService.shutdown();
Made static members used in unsynchronized context 'volatile' so they will be read from JVM and not from cache.Initially there were 5 to 10% failures (like 22 out of 473), I brought it down to less than 1%. There were static variables declared, I made them volatile that helped bring down the failures
Changed to thread-safe data-structures, like ConcurrentHashMaps, CopyOnWriteArrayLists, etc. The elements added to these thread-safe lists, maps, etc. are bound to variables declared final which means they should be thread-safe ideally.
I introduced a count down latch mechanism to decrement the thread-count, wait for a bit before calling executor service's shutdown method.
I also added a if (! future.isDone()) check which returns true meaning some future tasks are taking longer, in these cases I used the overloaded flavor of future.get with timeout to wait longer, still I get 2-5 failures in 1000 iterations.
I want to know if declaring variables holding elements added to thread-safe data-structures as final or volatile is better. I read a lot about them, but still unclear.
Result: For test iterations greater than 500, I always see 04 to 0.7 % failures.
Note: If I synchronize the main scanFile() method, it works without a single failure, but negates the multi-threaded asynchronous performance benefit and takes 3 times longer.
What I tried?
- Added countdown latch mechanism.
- Declared variables holding values added to thread-safe lists, maps volatile or final
Expected 0 failures after using thread-safe data-structure objects like ConcurrentHashMaps, CopyOnWriteArrayList, but still get 1-3 failures every 1000 runs.