0

I'm using a Caffeine cache with the following configuration:

datesCache = Caffeine.newBuilder()
                .maximumSize(1000L)
                .expireAfterWrite(1, TimeUnit.HOURS)
                .writer(new CacheWriter<String, Map<String, List<ZonedDateTime>>>() {
                    @Override
                    public void write(@NonNull final String key, @NonNull final Map<String, List<ZonedDateTime>> datesList) {
                        CompletableFuture.runAsync(() -> copyToDatabase(key, datesList), Executors.newCachedThreadPool());
                    }

                    @Override
                    public void delete(@NonNull final String key, @Nullable final Map<String, List<ZonedDateTime>> datesList,
                            @NonNull final RemovalCause removalCause) {
                        System.out.println("Cache key " + key + " got evicted from due to " + removalCause);
                    }
                })
                .scheduler(Scheduler.forScheduledExecutorService(Executors.newSingleThreadScheduledExecutor()))
                .removalListener((key, dateList, removalCause) -> {
                    LOG.info("Refreshing cache key {}.", key);
                    restoreKeys(key);
                })
                .build();

I'm using a CacheWriter to copy the records to distributed database upon writes to the cache if the values satisfy certain conditions. Also, upon eviction I'm using a RemovalListener to call a backend service with the evicted key to keep the records up-to-date.

In order to make this work, I also had to initialize the cache upon booting up the service and I use the put method to insert the values in the cache. I retrieve values from the cache using datesCache.get(key, datesList -> callBackendService(key)) just in case the request is for a key that I didn't get upon initialization.

The API that leverages this cache has periods of very heavy use, and it seems that for some reason the records were getting evicted (in every request?) because the code in the RemovalListener and the CacheWriter got executed every few milliseconds, eventually creating over 25,000 threads and making the service error out.

Can anybody tell me if I'm doing something deadly wrong? Or something painful obvious that is wrong? I feel like I'm deeply misunderstanding Caffeine.

The goal is to have the records in the cache refresh every 1 hr and upon refresh, get the new values from the backend API and persist them in a database if they satisfy certain conditions.

KassHino
  • 1
  • 3
  • Why are you creating a new thread pool on every write that is never shutdown? – Ben Manes Sep 25 '20 at 03:11
  • Hi Ben! Interesting, I didn't know a `CompletableFuture` didn't shutdown upon completion of the body of work. There is other code in the service similar to this, and I assumed those verifications of behavior were done at that time. I can definitely do `CompletableFuture.runAsync(() -> copyToDatabase(key, datesList), executor).thenRun(executor::shutdown)` and I'll certainly revisit the pieces of code where this is there. My question still is why does the `RemovalListener` and the `CacheWriter` code gets invoked so much? Is my cache configuration correct? How is the behavior of these? – KassHino Sep 25 '20 at 03:25
  • I suppose maybe because your removal listener restores the key (a put?). If the removal cause is REPLACED then you’ll have an infinite loop. For the threadpool you want to reuse it and can most often use the default FJP commonPool instead. – Ben Manes Sep 25 '20 at 03:41
  • Yes, it is a `put` on the cache. I'm seeing the `RemovalCause` [javadoc](https://javadoc.io/doc/com.github.ben-manes.caffeine/caffeine/2.3.5/com/github/benmanes/caffeine/cache/RemovalCause.html#REPLACED) and it's starting to make sense to me. So, as part of my solution I should only refresh if RemovalCause is EXPIRED, and those would only be the values evicted by the scheduler, correct? Also, thanks for the advice on how to properly use the thread pool. I appreciate it! – KassHino Sep 25 '20 at 03:51
  • Yep, that could work better. That put then calls the writer, btw. I’d drop the writer as not helpful and do it write next to the cache update code. – Ben Manes Sep 25 '20 at 04:30
  • Makes sense! Thanks a lot Ben. – KassHino Sep 25 '20 at 04:32

1 Answers1

0

The cache entered in an infinite loop because the RemovalListener is async and therefore, upon heavy load, the cache values were being replaced by a request to an expired key before the RemovalListener could actually refresh the cache.

Therefore the values will:

  1. Be removed from the cache with a REPLACED removal cause
  2. Call the RemovalListener
  3. Refresh again and be replaced, and then
  4. Go back to #1.

Solution: Evaluate the RemovalCause in the RemovalListener to ignore REPLACED keys. Method wasEvicted() can be used or compare the enum values themselves.

KassHino
  • 1
  • 3