39

The guava library has it's own Supplier which does not extend Java 8 Supplier. Also guava provides a cache for suppliers - Suppliers#memoize.

Is there something similar, but for Java 8 Suppliers?

yurez
  • 2,826
  • 1
  • 28
  • 22
Cherry
  • 31,309
  • 66
  • 224
  • 364
  • 13
    Not exactly, but you can easily convert between j.u.f.Suppliers and c.g.c.b.Suppliers just by writing `::get` at the end. – Louis Wasserman Feb 11 '16 at 05:31
  • 3
    as @LouisWasserman suggests, you could make a wrapper for the guava Suppliers::memoize by basically doing "return Suppliers.memoize(delegate::get)::get;" – jvdneste May 23 '16 at 13:12
  • 4
    It's definitely a pity that Suppliers.memoize did not make it into the jdk8 standard library, considering that it seems like a very low-risk addition to me. – jvdneste May 23 '16 at 13:20

3 Answers3

39

There's no built-in Java function for memoization, though it's not very hard to implement it, for example, like this:

public static <T> Supplier<T> memoize(Supplier<T> delegate) {
    AtomicReference<T> value = new AtomicReference<>();
    return () -> {
        T val = value.get();
        if (val == null) {
            val = value.updateAndGet(cur -> cur == null ? 
                    Objects.requireNonNull(delegate.get()) : cur);
        }
        return val;
    };
}

Note that different implementation approaches exist. The above implementation may call the delegate several times if the memoized supplier requested simultaneously several times from the different threads. Sometimes such implementation is preferred over the explicit synchronization with lock. If lock is preferred, then DCL could be used:

public static <T> Supplier<T> memoizeLock(Supplier<T> delegate) {
    AtomicReference<T> value = new AtomicReference<>();
    return () -> {
        T val = value.get();
        if (val == null) {
            synchronized(value) {
                val = value.get();
                if (val == null) {
                    val = Objects.requireNonNull(delegate.get());
                    value.set(val);
                }
            }
        }
        return val;
    };
}

Also note, as @LouisWasserman correctly mentioned in comments, you can easily transform JDK supplier into Guava supplier and vice versa using method reference:

java.util.function.Supplier<String> jdkSupplier = () -> "test";
com.google.common.base.Supplier<String> guavaSupplier = jdkSupplier::get;
java.util.function.Supplier<String> jdkSupplierBack = guavaSupplier::get;

So it's not a big problem to switch between Guava and JDK functions.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 2
    You don't really need the `AtomicReference` in this case, do you? It seems to be used just as a mutable container that the lambda can close over. If you want to save one object allocation I think you could return an anonymous class instance with a volatile `value` field. An synchronize on `this`. – Lii Feb 11 '16 at 08:04
  • 3
    @Lii, `AtomicReference` is just an object with single field which affords volatile read/write semantic which is necessary here. It's possible to replace it with anonymous class volatile field (in the second sample only, not in the first one), but it's not very evident whether such optimization matters. Besides, locking on publicly available object is considered a bad practice. – Tagir Valeev Feb 11 '16 at 08:26
  • 4
    You can eliminate the volatile semantics by remembering another supplier of the form `()->val`. That way, you are using the `final` field semantics of the captured value. – Holger Feb 11 '16 at 08:34
28

The simplest solution would be

public static <T> Supplier<T> memoize(Supplier<T> original) {
    ConcurrentHashMap<Object, T> store=new ConcurrentHashMap<>();
    return ()->store.computeIfAbsent("dummy", key->original.get());
}

However, the simplest is not always the most efficient.

If you want a clean and efficient solution, resorting to an anonymous inner class to hold the mutable state will pay off:

public static <T> Supplier<T> memoize1(Supplier<T> original) {
    return new Supplier<T>() {
        Supplier<T> delegate = this::firstTime;
        boolean initialized;
        public T get() {
            return delegate.get();
        }
        private synchronized T firstTime() {
            if(!initialized) {
                T value=original.get();
                delegate=() -> value;
                initialized=true;
            }
            return delegate.get();
        }
    };
}

This uses a delegate supplier which will do the first time operation and afterwards, replace itself with a supplier that unconditionally returns the captured result of the first evaluation. Since it has final fields semantics, it can be unconditionally returned without any additional synchronization.

Inside the synchronized method firstTime(), there is still an initialized flag needed because in case of concurrent access during initialization, multiple threads may wait at the method’s entry before the delegate has been replaced. Hence, these threads need to detect that the initialization has been done already. All subsequent accesses will read the new delegate supplier and get the value quickly.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Very interesting answer. Can you explain how returning `delegate.get()` directly without synchronisation or `volatile` modifier is thread-safe? How is it guaranteed that all threads arriving there see the updated delegate when they call `get`? – glts Feb 11 '16 at 16:47
  • 3
    @glts: `delegate.get()` will end up either, at the `synchronized` method `firstTime()` for the first invocation(s) or at the instance associated with the `() -> value` lambda expression, whereas `value` is effectively final. Accessing that captured value is equivalent to reading a `final` field wich is safe without additional synchronization. In case a thread sees a stale value for the `delegate` reference, it will go through the `synchronized` `firstTime()` method for a single invocation and will know the up-to-date value afterwards, so all subsequent invocations go the fast path then. – Holger Feb 11 '16 at 17:23
  • 3
    why doesn't `delegate` need to be marked `volatile` in that case? – Mark Elliot Jan 15 '17 at 00:02
  • 4
    @Mark Elliot: captured values of local variables have `final` field semantics, so if a thread encounters the new delegate, i.e. `() -> value`, it will also read `value` properly. Due to the absence of `volatile`, a thread may encounter the old `delegate`, but in this case it will enter the `synchronized` method `firstTime()` and read the up-to-date values of `initialized` and `delegate`. This may happen at most once for every thread, as afterwards, it has the most recent values. – Holger Jan 16 '17 at 08:49
  • This is really ingenious; a function that replaces the reference to itself. Presumably you only do this if there's a chance the supplier might not be called (so the laziness avoids ever running the computation) or you need the computation delayed? Otherwise you'd just compute the value in the constructor and avoid the delegate right? Or should I view such a delegate call as zero overhead? – charles-allen Jul 31 '17 at 04:22
  • 1
    @CodeConfident: you can view the delegation as almost zero overhead, still, you shouldn’t use this code with higher complexity without a reason, i.e. if there is indeed a chance of not needing the value at all or if you expect a significant delay between the construction of the `Supplier` and the first actual value access. For ordinary cases, you should just calculate the value in the constructor (in case of a class) or calculate before capturing (`T value = calculate(); Supplier s = () -> value;`) – Holger Jul 31 '17 at 08:20
  • @Holger: Silly question... Do you need the initialized field? It's changed at the same time the delegate is changed. Can you not cut `firstTime()` down to `T value=original.get(); delegate=() -> value; return value;` – charles-allen Jul 31 '17 at 08:53
  • 5
    @CodeConfident: that’s explained in the last paragraph of the answer. If multiple threads call `get()` at the same time when it is in the uninitialized state, they all read the reference to the initial supplier and try to enter the `firstTime()` method, all but one getting blocked due to the `synchronized`. After the first thread has completed the initialization, the pending threads will proceed one after another, each of them having to detect that the initialization has been done already. That’s a rare scenario, but in a general solution, it must be handled. – Holger Jul 31 '17 at 09:00
  • @Holger this makes me go to a forest, hide under a rock and pretend I never ever wrote code in my life. Exceptionally ingenious - I understood it today, but most certainly will forget what I understood by tomorrow... Even guava is using the double checked locking – Eugene Jul 31 '17 at 14:12
  • @Holger: I'd read it but I had understood it incorrectly. Thanks for your patience explaining it again! I've learned some new FP & CP here! :) – charles-allen Jul 31 '17 at 15:28
  • I would like to remind that Guava `Suppliers.memoize` have additional feature described in doc "If delegate is an instance created by an earlier call to memoize, it is returned directly." So I wouldn't recommend to use anonymous class here, but rather create new private static inner class and add simplest condition in `memoize1` method. – iMysak Jul 18 '18 at 23:19
  • 1
    @iMysak if that feature is important to you, you have to go for a named nested class. However, I don’t consider it important, as even if you wrap such a supplier into another, the original supplier is called only once. I’d rather question the application design when it mindlessly calls `memoize` for arbitrary suppliers, without knowing whether it will pay off. – Holger Jul 19 '18 at 07:19
  • 2
    using a concurrent hashmap for a single element to get a lock seems a huge waste to me. – Alexander Oh Aug 07 '18 at 08:12
  • 1
    @Alex it’s not for getting a lock, but getting an initialization lock and subsequent lock-free access. As the answer already said, the map based solution is the simplest, but “the simplest is not always the most efficient”. But don’t overestimate the overhead of a `ConcurrentHashMap`, after all, it’s just an object, regardless of the *functionality* it offers. Are you aware that in the reference implementation, every `HashSet` is a wrapper around a `HashMap`? *That’s* what I call an overhead, still, we all live with it for two decades now… – Holger Aug 11 '18 at 11:19
  • @Holger Both the statements the approach relies upon are groundless: [1]"`() -> value` lambda expression, whereas `value` is effectively final. Accessing that captured value is equivalent to reading a final field"; [2]"In case a thread sees a stale value for the `delegate` reference, it will go through the `synchronized firstTime()` method for a single invocation and will know the up-to-date value afterwards". See the thread http://cs.oswego.edu/pipermail/concurrency-interest/2019-August/016917.html for more details, and reply there if you have time (SO is a terrible place for discussions). – Valentin Kovalenko Aug 27 '19 at 17:55
  • 2
    @Male that discussion is pointless, as it tries to discuss an implementation-independent view on an implementation detail. From a JLS point of view, a lambda has no fields, but just uses variables of the surrounding context. The fields used to hold copies of the values are already an unspecified implementation detail. So when a lambda expression uses a local variable of the surrounding context, it contradicts §17.4.1. which says that local variables “are never shared between threads”, which obviously, needs to be fixed. The intent is to have immutable resp. stateless lambdas, immune to races. – Holger Aug 28 '19 at 07:09
  • @Holger On the opposite, that discussion is about specification-guaranteed behaviour, while your implementation is based on the assumptions which are correct only for specific implementations and hardware (especially [2], which is wrong from the standpoint of JLS). All I am saying is if you can't support [1] and [2] with JLS (so far you have not done this), you can't claim the code is correct. – Valentin Kovalenko Aug 28 '19 at 16:24
  • @Holger "The intent is to have immutable resp. stateless lambdas" - a result of a lambda expression (let's call it a lambda-object) is obviously stateful and mutable if it captures a (surely final) variable which points to a mutable object. But this is again not relevant to your claim that writes and reads of the state of lambda-objects comply with the semantics of final fields. This statement should either be proven with JLS/JVMS, or treated as baseless. Note that it's not me who started applying the aforementioned semantics to the state of lambda-objects. – Valentin Kovalenko Aug 28 '19 at 16:34
  • 1
    @Male I don’t see, how the second point is wrong. The JLS clearly forbids out-of-thin-air values, so if the lambda (reference) is not the new one, it must be the old one, which does invoke the `synchronized` method. A lambda expression may point to mutable objects, but combining caching suppliers with subsequent modifications would be an error in itself, regardless of the caching solution.So that’s irrelevant here. A lambda is as immutable as strings are, which contain a reference to a mutable array. Now I really wish, someone would implement a contradicting JRE to discuss with the JLS authors – Holger Aug 29 '19 at 07:18
  • @Holger I would like to methodically discuss this with you (or anyone else for that matter) but StackOverflow's comment length limit makes it impossible. I suggested concurrency-interest, you refused. If you want, contact me via any of https://sites.google.com/site/aboutmale/contacts, and I will be glad to have the discussion as comprehensive as needed. In short: immutable and unmodifiable are two very different properties; [2] is not about out-of-thin-air values, but about the possibility to never observe the read of a new value to `delegate`; lambdas are immutable - this is wishful thinking. – Valentin Kovalenko Aug 29 '19 at 19:26
  • 1
    @Male when a thread does not observe the new `delegate`, it observes the old value, hence, will enter the `synchronized` method. What’s so hard to understand about it? – Holger Aug 30 '19 at 07:27
  • 1
    @Holger Yes, if a new `delegate` value is not observed, then the one that acquires a monitor (enters `synchronized` block) is used. This is true, and this is exactly what I was saying previously. But the catch is that the new value is allowed to never be observed. Therefore each `get` is allowed to always acquire a monitor, which is a blocking and much more costly operation than a volatile read that the implementation tries hard to omit. So instead of being more effective than DCL (after all there are no volatile reads at all) the implementation may be as ineffective as a synchronized method. – Valentin Kovalenko Aug 30 '19 at 14:32
  • 1
    @Holger Add here the fact that the implementation relies on a property of a lambda-object (the one you stated explicitly and I mentioned in [1]) which is not supported by JLS requirements, and we have a solution that is incorrect (a reader is allowed to observe a not completely initialized state of the supplied `value`) and potentially as efficient as a `synchronized` method (i.e. not efficient). Interestingly, making the `delegate` field `volatile`, solves both problems. E.g, it guarantees that the number of times a monitor is acquired is finite, provided that the number of threads is finite. – Valentin Kovalenko Aug 30 '19 at 14:42
  • 1
    @Male making the field `volatile` does not guaranty that the supplier is executed at most one time. You’d still need a lock or `synchronized` to prevent concurrent execution. And once you have executed the `synchronized` method, you have the visibility guaranty. Of course, only for the thread which executed the `synchronized` method, so in the worst case, each thread will execute the `synchronized` method once. But after that, subsequent reads of the same thread will read the new value without additional synchronization. But if you want a simple solution, look at the beginning of the answer. – Holger Aug 30 '19 at 14:58
  • @Holger "volatile does not guaranty that the supplier is executed at most one time" - I have not said this: there is a difference between "the number of times a monitor is acquired is finite" and "at most one time". "And once you have executed the synchronized method, you have the visibility guaranty" - this is false, you are making a classic mistake thinking that one may have a synchronization action when writing and then read without any synchronization. http://cs.oswego.edu/pipermail/concurrency-interest/2019-August/016917.html provides a formal explanation. – Valentin Kovalenko Aug 30 '19 at 15:45
  • 1
    @Holger "And once you have executed the synchronized method, you have the visibility guaranty. Of course, only for the thread which executed the synchronized method, so in the worst case, each thread will execute the synchronized method once" - this is exactly what I mentioned in [2]. This statement is simply wishful thinking which is not based on JLS. JLS explicitly allows reads to observe writes (to the same variable) that "immediately" happened-before the read _or other writes that are not happens-before-ordered with the read_ (happens-before consistent executions, JLS 17.4.7). – Valentin Kovalenko Aug 30 '19 at 15:53
  • 1
    @Male sure, the JLS allows to observe other writes, but of course, only other writes that do actually exist. As already said, the JLS forbids out-of-thin-air values. So what “other writes” are you talking about? The execution of the `synchronized` method enforces an ordering, so the thread sees *at least all writes made in previous executions* of the method and since that method does *at most one write* (in the first execution), the subsequent executes will perceive that only write. It’s understood that you don’t agree on the visibility of the captured `value`, but what’s the second problem? – Holger Aug 30 '19 at 16:36
  • @Holger Turns out, StackOverflow has a chat, and it suggested to go there instead of increasing the number of comments here. So let's [continue this discussion in chat](https://chat.stackoverflow.com/rooms/198740/discussion-between-male-and-holger). – Valentin Kovalenko Aug 30 '19 at 20:06
  • horrible!!! 1) ConcurrentHashMap - you will get a memory leak. 2) return new Supplier() { ..} - you will get a lot of useless classes – Yura Sep 11 '19 at 13:12
  • 2
    @Yura the `ConcurrentHashMap` will exist as long as the `Supplier` exist. So the supplier consumes more memory than other solutions, but this is not a memory leak. Further, a construct like `new Supplier() { ..}` defines exactly one class. Just like each lambda expression produces exactly one class with the current implementation. In practice, you won’t notice a significant difference to other solutions, like those capturing an `AtomicReference`. But if you want efficiency, eliminate the need for cached suppliers in your application logic in the first place. – Holger Sep 11 '19 at 14:11
5

A simple wrapper for Guava 20 on Java 8:

static <T> java.util.function.Supplier<T> memoize(java.util.function.Supplier<? extends T> supplier) {
    return com.google.common.base.Suppliers.memoize(supplier::get)::get;
}
Kohei Nozaki
  • 1,154
  • 1
  • 13
  • 36
  • 1
    I think you could just return `com.google.common.base.Suppliers.memoize(supplier::get)`. Guava suppliers extend java suppliers. – Pawel Zieminski Oct 01 '20 at 21:41
  • 1
    @PawelZieminski right, if it's Guava 21 or greater it would be better https://github.com/google/guava/wiki/Release21#commonbase – Kohei Nozaki Oct 02 '20 at 00:25