Cache with long running computations of values

Question

I need to store objects in a cache and hese objects take a long time to create. I started with ConcurrentHashMap<id, Future<Object>> and everything was fine, until Out of Memory started to happen. Moved to SoftReferences and it was better, but now I need to control eviction. I'm in the process of moving to Ehcache.

I'm sure there is a library for such thing but I really need to understand the logic of doing the cache storage and calculation in two phases, while keeping everything consistent and not recalculating something that is already calculated or in the process of being calculated. Is a two level cache, one for the more persistent result and the other for the in the process of being calculated.

Any hints on how to better the following code which I'm sure has concurrency problems in the Callable.call() method?

public class TwoLevelCache {

    // cache that serializes everything except Futures
    private Cache complexicos = new Cache();

    private ConcurrentMap<Integer, Future<Complexixo>> calculations = 
        new ConcurrentHashMap<Integer, Future<Complexico>>();

    public Complexico get(final Integer id) {

        // if in cache return it
        Complexico c = complexicos.get(id);
        if (c != null) { return c; }

        // if in calculation wait for future
        Future f = calculations.get(id);
        if (f != null) { return f.get(); } // exceptions obviated

        // if not, setup calculation
        Callable<Complexico> callable = new Callable<Complexico>() {
            public Complexico call() throws Exception {
                Complexico complexico = compute(id);
                // this might be a problem here
                // but how to synchronize without
                // blocking the whole structure?
                complexicos.put(id, complexico);
                calculations.remove(id);
                return complexico;
            }
        };

        // store calculation 
        FutureTask<Complexico> task = new FutureTask<Complexico>(callable);
        Future<Complexico> future = futures.putIfAbsent(id, task);
        if (future == null) {
            // not previosly being run, so start calculation
            task.run();
            return task.get(); // exceptions obviated
        } else {
            // there was a previous calculation, so use that
            return future.get(); // exceptions obviated
        }

    }

    private Complexico compute(final Integer id) { 
        // very long computation of complexico
    }

}

You could use Java 8's [`Map.computeIfAbsent`](https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function-) with a `ConcurrentHashMap` to tidy your code rather a lot. `Callable.call` is certainly a bit of a disaster zone. — Boris the Spider, Nov 09 '14 at 19:54
How long does it take between creating a Future and obtaining the result? — Rafal G., Nov 09 '14 at 19:55
@rmarimon Since you were hitting your heap limits with OOM then what is the number of objects being created? — Rafal G., Nov 09 '14 at 20:53
@R4J in the general case is about 200.000 but we currently got into the millions. These are very simple objects where the value takes some time to calculate. — Ricardo Marimon, Nov 09 '14 at 20:58

score 0 · Answer 1 · answered Nov 09 '14 at 21:05

0

And what do you do with the values once they are calculated? What is the number of new calculations per second?

If they are used (stored) and then disposed then I think that Reactive approach (RxJava and similar) could be a nice solution. You could put your "tasks" (a POJO with all info needed to perform calculation) on some off-heap structure (it could be some persistent queue etc.) and only perform calculations for as many as you want (throttle the process with the number for computational threads you want to have).

This way you would avoid OOM and would also gain much more control over the entire process.

answered Nov 09 '14 at 21:05

Rafal G.

4,252
1
25
41

The values are paths between a structure of nodes. The pointers are least distance between nodes, which is usually a very small data. I need to keep these paths in memory for as long as possible. – Ricardo Marimon Nov 09 '14 at 21:11
JVM's heap is a costly place to keep such amount of data. You should think of some off-heap structures/libraries or just just some sort of K/V store like Redis? – Rafal G. Nov 09 '14 at 21:13

Cache with long running computations of values

1 Answers1