2

so we are using Guava v18. We are going a lot of thread with this content:

"catalina-exec-ajp2" daemon prio=10 tid=0x00007fb45c001800 nid=0x18aa waiting on condition [0x00007fb435151000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000075ff0cfe8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
    at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3557)
    at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2302)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2191)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
    at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
    at com.xxx.yyy.repository.cache.acl.guava.GuavaCacheWrapper.get(GuavaCacheWrapper.java:21)
    at com.xxx.yyy.repository.cache.CachedCpaRepository.findAll(CachedCpaRepository.java:32)

Our guava configuration is:

  @SuppressWarnings("unchecked")
  @Override
  protected <K, R> CacheWrapper<K, R> newCacheConfiguration(Fetcher fetcher, String cacheName, int maxElementsInMemory,
      int timeToIdleSeconds, int timeToLiveSeconds)
  {
    Cache existingCache = manager.getCache(cacheName);
    CacheWrapper<K, R> cache;
    if(null == existingCache) 
    {
      cache = new GuavaCacheWrapper<>(CacheBuilder.newBuilder()
        .maximumSize(maxElementsInMemory)
        .expireAfterAccess(timeToIdleSeconds, SECONDS)
        .refreshAfterWrite(timeToLiveSeconds, SECONDS)
        .recordStats()
        .build( CacheLoader.asyncReloading(fetcher.getAs(CacheLoader.class), executors)));
      manager.addCache(cacheName, cache);
    }
    else 
    {
      cache = (CacheWrapper<K, R>) ((GuavaCache) existingCache).getNativaGuavaCache();
    }
    return cache;
  }

I'm a bit clueless on what I could check to see what's generating the problem. Could you suggest what I can check?

dierre
  • 7,140
  • 12
  • 75
  • 120

1 Answers1

1

If you have blocking gets on a LoadingCache, it means you're loading the value for the same key from multiple threads: the first thread actually loads the value, and the next threads wait for that load to complete.

Maybe you have a small number of keys and processing the value is a costly operation? You could log the keys you're trying to load, and the duration of the loading.


Update

The threads doing the actual loading are the ones calling the CacheLoader, but you can't find which corresponds to which waiting thread from the thread dump (if there are multiple threads loading).

I created a test to reproduce the situation:

public class ExpensiveCache {
    private final LoadingCache<String, String> cache = CacheBuilder.newBuilder().build(new ExpensiveCacheLoader());

    public static void main(String... args) {
        new ExpensiveCache().run();
    }

    private void run() {
        // Start 2 threads concurrently accessing the same key in the cache
        for (int i = 0; i < 2; i++) {
            new Thread(() -> cache.getUnchecked("key")).start();
        }
    }

    private static class ExpensiveCacheLoader extends CacheLoader<String, String> {
        @Override
        public String load(String key) throws Exception {
            // Make the computation "expensive" by sleeping for 60s
            Thread.sleep(60_000);
            return key;
        }
    }
}

You indeed can't find the id of the condition anywhere else in the thread dump:

"Thread-2" #12 prio=5 os_prio=31 tid=0x00007fbb6384e000 nid=0x5703 waiting on condition [0x00000001291c5000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000076b183208> (a com.google.common.util.concurrent.AbstractFuture$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
        at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3557)
        at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2302)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2289)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
        at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
        at io.github.fpavageau.ExpensiveCache.lambda$run$0(ExpensiveCache.java:16)
        at io.github.fpavageau.ExpensiveCache$$Lambda$1/1108411398.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)

"Thread-1" #11 prio=5 os_prio=31 tid=0x00007fbb6384d000 nid=0x5503 waiting on condition [0x00000001290c2000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at io.github.fpavageau.ExpensiveCache$ExpensiveCacheLoader.load(ExpensiveCache.java:23)
        at io.github.fpavageau.ExpensiveCache$ExpensiveCacheLoader.load(ExpensiveCache.java:20)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
        - locked <0x000000076b199a50> (a com.google.common.cache.LocalCache$StrongEntry)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
        at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
        at io.github.fpavageau.ExpensiveCache.lambda$run$0(ExpensiveCache.java:16)
        at io.github.fpavageau.ExpensiveCache$$Lambda$1/1108411398.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)
Frank Pavageau
  • 11,477
  • 1
  • 43
  • 53
  • Could you be more specific in where I follow the id of the lock, please? Every thread with that output is locked on 0x000000075ff0cfe8 but I don't understand how to understand which is the thread that is locking first on that thread (from the tdump) – dierre Sep 16 '15 at 16:20
  • I've edited the answer, as it turns out you can't follow the id. You can still check the cache is loading some values, though. – Frank Pavageau Sep 16 '15 at 21:51
  • Thanks, so knowing which thread is locking the cache is not possible using a tdump, right? What we discovered is that with a preload of that key, we don't incur in that problem, using your example, but I'm guessing that's pretty obvious. The system is under heavy heavy load so probably that could be the reason why see threads growing. Thank you very much. We'll investigate more. You've been really helpful. (Although our computation of the key is not THAT costly) – dierre Sep 18 '15 at 05:07
  • Instead of a costly computation, it could also be the result of the expiry and maximum size policies forcing a lot of churn in the cache under the heavy load. Or some external factor triggering the retrieval of an (uncomputed yet) value from multiple threads at once. – Frank Pavageau Sep 18 '15 at 13:18