2

I have what I thought was a simple use of Guava cache. However, the behavior is not intuitive to me. I have a POJO, Foo with attribute Id (Integer). I use the Integer as the key to the cache when retrieving instances of Foo. If I put three items in the cache, and sleep for a period long enough to have everything expire, I would expect the same behavior regardless of the key value. The problem is that I see different behavior based on the key used. I get three objects into the cache: 1000, 2000, and 3000.

[main] INFO CacheTestCase - 3000 creating foo, 1000
[main] INFO CacheTestCase - 3000 creating foo, 2000
[main] INFO CacheTestCase - 3000 creating foo, 3000
[main] INFO CacheTestCase - 3000 Sleeping to let some cache expire . . .
[main] INFO CacheTestCase - 3000 Continuing . . .
[main] INFO CacheTestCase - 3000 Removed, 1000
[main] INFO CacheTestCase - 3000 Removed, 2000
[main] INFO CacheTestCase - 3000 creating foo, 1000
[main] INFO CacheTestCase - 

Notice that, in the above run, the instance of Foo with a key of 3000 was not removed from the cache. Below is the output for the same code, but instead of a key of 3000, I used 4000.

[main] INFO CacheTestCase - 4000 creating foo, 1000
[main] INFO CacheTestCase - 4000 creating foo, 2000
[main] INFO CacheTestCase - 4000 creating foo, 4000
[main] INFO CacheTestCase - 4000 Sleeping to let some cache expire . . .
[main] INFO CacheTestCase - 4000 Continuing . . .
[main] INFO CacheTestCase - 4000 Removed, 1000
[main] INFO CacheTestCase - 4000 Removed, 2000
[main] INFO CacheTestCase - 4000 Removed, 4000
[main] INFO CacheTestCase - 4000 creating foo, 1000

Surely, I've done something incredibly stupid. Here's my MCVE:

package org.dlm.guava;

import com.google.common.cache.*;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.concurrent.TimeUnit;

/**
 * Created by dmcreynolds on 8/17/2015.
 */
public class CacheTestCase {
    static final Logger log = LoggerFactory.getLogger("CacheTestCase");
    String p = ""; // just to make the log messages different
    int DELAY = 10000; // ms
    @Test
    public void testCache123() throws Exception {
        p = "3000";
        LoadingCache<Integer, Foo> fooCache = CacheBuilder.newBuilder()
                .maximumSize(1000)
                .expireAfterWrite(100, TimeUnit.MILLISECONDS)
                .removalListener(new FooRemovalListener())
                .build(
                        new CacheLoader<Integer, Foo>() {
                            public Foo load(Integer key) throws Exception {
                                return createExpensiveFoo(key);
                            }
                        });

        fooCache.get(1000);
        fooCache.get(2000);
        fooCache.get(3000);
        log.info(p + " Sleeping to let some cache expire . . .");
        Thread.sleep(DELAY);
        log.info(p + " Continuing . . .");
        fooCache.get(1000);
    }


    private Foo createExpensiveFoo(Integer key) {
        log.info(p+" creating foo, " + key);
        return new Foo(key);
    }


    public class FooRemovalListener
        implements RemovalListener<Integer, Foo> {
        public void onRemoval(RemovalNotification<Integer, Foo> removal) {
            removal.getCause();
            log.info(p+" Removed, " + removal.getKey().hashCode());
        }
    }

    /**
     * POJO Foo
     */
    public class Foo {
        private Integer id;

        public Foo(Integer newVal) {
            this.id = newVal;
        }

        public Integer getId() {
            return id;
        }
        public void setId(Integer newVal) {
            this.id = newVal;
        }
    }
}
TylerH
  • 20,799
  • 66
  • 75
  • 101

2 Answers2

3

From the Javadoc for CacheBuilder:

If expireAfterWrite or expireAfterAccess is requested entries may be evicted on each cache modification, on occasional cache accesses, or on calls to Cache.cleanUp(). Expired entries may be counted by Cache.size(), but will never be visible to read or write operations.

One thing that's saying is that once expired, if you try to read any of the expired entries you will see that they are no longer present. So for example, despite the fact that you're not seeing the entry for 3000 being removed in your RemovalListener, if you called fooCache.get(3000), it would have to load that value first (and you'd see a removal of the old value at that time). So from the perspective of a user of the cache API, the old cached value is gone.

The reason why you're seeing the specific behavior in your examples is pretty simple: the cache is segmented for concurrency reasons. Entries are assigned a segment based on their hash code, and each segment acts like a small independent cache. So most operations (such as fooCache.get(1000)) will only operate on a single segment. In your example, 1000 and 2000 are clearly assigned to the same segment, while 3000 is in another segment. 4000, in your second version, is getting assigned to the same segment as 1000 and 2000, so it gets cleaned up along with the other two when the write of the new value for 1000 happens.

In most actual use, segments should generally be getting hit often enough that expired entries will be cleaned up regularly enough to not be a problem. No guarantee is made about exactly when that'll happen, though, unless you call cleanUp() on the cache.

ColinD
  • 108,630
  • 30
  • 201
  • 202
2

Maintenance doesn't happen instantly as soon as the timeout occurs.

From the documentation (emphasis mine):

When Does Cleanup Happen?

Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.

The reason for this is as follows: if we wanted to perform Cache maintenance continuously, we would need to create a thread, and its operations would be competing with user operations for shared locks. Additionally, some environments restrict the creation of threads, which would make CacheBuilder unusable in that environment.

Instead, we put the choice in your hands. If your cache is high-throughput, then you don't have to worry about performing cache maintenance to clean up expired entries and the like. If your cache does writes only rarely and you don't want cleanup to block cache reads, you may wish to create your own maintenance thread that calls Cache.cleanUp() at regular intervals.

If you want to schedule regular cache maintenance for a cache which only rarely has writes, just schedule the maintenance using ScheduledExecutorService.

Any of these solutions should work for you, if it's important for the cleanups to happen promptly in your system.


Unrelated, you probably already know this but I hope you aren't declaring all your cache types using the raw types. It's better to specify them all with their fully parameterized <Integer, Foo> types to prevent risk of heap pollution.

Community
  • 1
  • 1
durron597
  • 31,968
  • 17
  • 99
  • 158