0

I am creating my own cache object that will serve several threads, this is an assignment so i am not allowed to use packages and jar that are out there. I am trying to figure out how to invalidate all of them at the same time. I have a date structure that contains a bunch of entries, where the key is an integer and the value is a boolean. When a worker have a miss, it adds the value to its cache. I have several other threads that update this data structure, and once they update it they should invalidate all the other worker threads that have this cache, only if the have this entry in their cache.

For example say that there are two workers T1 cache has 1,true T2 cache has 3,true

and the data structure has 1,true; 2,true; 3 true. Now the updater changes 3, false. So it should check T1 and not do anything and should check T2 and change it. However, this two checks should happen somehow at the same time, because if I have a case in which T1 cache has 3,true T2 cache has 3,true T1 may be invalidated, while T2 was not invalidated yet, and we have an inconsistent behavior.

any ideas? my cache code is

import java.util.LinkedHashMap;
import java.util.Map;
import java.util.concurrent.locks.ReentrantLock;

public class workerCache {
    @SuppressWarnings("rawtypes")
    LinkedHashMap cache;
    ReentrantLock lock;

    @SuppressWarnings("serial")
    public <T> workerCache(final int maxEntries) {
        this.lock = new ReentrantLock();

        this.cache = new LinkedHashMap<T, T>(maxEntries + 1) {
            @SuppressWarnings("rawtypes")
            protected boolean removeEldestEntry(Map.Entry eldest) {
                return size() > maxEntries;
            }
        };
    }

    @SuppressWarnings("unchecked")
    public <T> void setEntry(T key, T value) {

        lock.lock();
        try {
            cache.put(key, value);
        } finally {
            lock.unlock();
        }
    }

    public <T> void invalidateEntry(T key) {
        lock.lock();
        try {
            cache.remove(key);
        } finally {
            lock.unlock();
        }

    }

    @SuppressWarnings("unchecked")
    public <T> T get(T key) {
        lock.lock();
        try {
            return (T) this.cache.get(key);
        } finally {
            lock.unlock();
        }
    }
Quantico
  • 2,398
  • 7
  • 35
  • 59
  • Your cache architecture precludes anything but global locking for your requirements. Redesign the cache, and while you're at it, close the possible integrity gaps you current cache API introduces (what happens if the value changes right the instant get() returns a value?). – Durandal May 12 '14 at 16:20
  • that is the issue, how do I invalidate all the data exactly after a change? Otherwise I find myself in the case that you just described – Quantico May 12 '14 at 16:38
  • Think about inversion of responsibility: Instead of the updater notifying the caches, have the caches validate their entries are valid for each get(). Also think carefully about repsonsibilities between worker and cache - who is responsible for handling cache misses? Who controls integrity of entries? Who controls integrity of data *derived* from entries (aka. results)? – Durandal May 12 '14 at 16:48
  • Thank you for your answer. If the worker will update the cache with every get, would not it be like not having a cache, i.e. you are treating every hit as a miss! Further, isn't it the worker responsibility to update its cache upon a miss? I assumed that the updater should control the integrity of the entries – Quantico May 12 '14 at 17:22
  • If your updater does not provide a fast way to validate an entry, then yes, it would defeat the purpose - depending on the *details* of the actual data (how much is it, frequency of change, cost of entry retrieval) you need to work out a method that allows fast entry validation. Simplest method is probably a simple version counter that qualifies the entire cache - side effect is that any change invalidates all entries. If thats not desired, you can either go down to a version for each entry, or use a partitioning scheme, where each partiton has a version. – Durandal May 12 '14 at 17:40
  • As for the worker handling a miss - whats the point/advantage? The *cache* should handle the miss, as the caller I don't want to bother if data comes from the cache or not - I just want the data. This also gives the cache a chance to actually control transactional properties of entry creation. – Durandal May 12 '14 at 17:43
  • Thank you ,I understand most of your points beside two: The first, the update of the time stamp. How does one guarantees that all caches sees the same time stamp at the same time? The second, if the cache is in charge on updating the values in it upon a miss, do I need to allocate a thread per cache – Quantico May 12 '14 at 18:20
  • You don't use a *timestamp*, you use a version counter (per entire cache, partition or entry) - the difference is that the counter is incremented by the updater. Caches simply store the version count with their entry - they "validate" the entry by checking the store version is the same as the updater provides for that item (sort of how modCount works in collections and their iterators). Entries would only be detected as invalid if they are actually requested, a *local* cache could/should only be accessed by its worker thread (that would allow you to omit all locking there). – Durandal May 12 '14 at 18:29
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/52548/discussion-between-quantico-and-durandal) – Quantico May 12 '14 at 20:32

1 Answers1

1

It sounds like you're imagining that three threads "T1", "T2", "T3" all have their own copy of workerCache that they need to keep in sync. Is this right?

If so, I would say that's a problem. Instead of three caches (one for each thread), how about one cache shared between all threads?

That way, everybody sees the same data all the time b/c there's only one copy of the data (since there is only one cache). If you invalidate an entry from T1 then everybody "sees" that invalidation at the same time -- as a consequence of there being only one cache.

If you have three threads all updating the same key, then the last one into the cache wins. I'm not sure if that's an issue for you or not.

Am I anywhere approaching the problem?

Darren Gilroy
  • 2,071
  • 11
  • 6
  • This is an interesting solution. However, isn't it reducing the effectivity of the cache? As in,sharing this object with 3 threads will result in threads going to some location in memory to access this instead of having a local copy. Maybe I am not getting the all idea of memory and locality right. – Quantico May 23 '14 at 18:44
  • 1
    Memory locality is an issue when you have multiple writers competing for the same cache line. If you have multiple readers then they can have their own processor-local copy -- as a side effect of how processor caches and cache invalidation works. (I mean the implementation of cache invalidation between cores and socks and such down the hardware.) See http://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html for a discussion, esp "The Principle at Scale" – Darren Gilroy May 26 '14 at 02:21