3

In a Java 8 stream with a filter condition, every element in the collection is passed to the filter for checking the condition. Here I am writing two different filter conditions and giving different workflows.

public static void main(String[] args) {

    List<String> asList = Arrays.asList("a", "b", "c", "d", "e", "a", "b", "c");

    //line 1
    asList.stream().map(s -> s).filter(distinctByKey(String::toString)).forEach(System.out::println);

    Predicate<String> strPredicate = (a) -> {
        System.out.println("inside strPredicate method--");
        return a.startsWith("a");
    };

    //line 2
    asList.stream().filter(strPredicate).forEach(System.out::println);
}

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    System.out.println("inside distinctByKey method...");
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

In the above sample code, the statement line 1 filter condition is executing only once but line 2 is executing for every element in the collection output.

I thought the distinctByKey method would execute for every element in the collection, but it is not the case. Why ?

Also the Set object reference variable seen is executing only once? How is the flow working?

Didier L
  • 18,905
  • 10
  • 61
  • 103
Learn Hadoop
  • 2,760
  • 8
  • 28
  • 60
  • `distinctByKey()` runs only once because it creates a new lambda for the predicate, which then is executed on every element. – Thomas Aug 22 '18 at 12:42
  • 1
    `.map(s -> s)` does literally nothing, by the way – Michael Aug 22 '18 at 12:43
  • 4
    you should also say that this code is taken *literally* from a [Stuart Mark's answer](https://stackoverflow.com/a/27872852/1059372) – Eugene Aug 22 '18 at 12:51

2 Answers2

18

distinctByKey is a lambda factory method. It is returning a Predictate<T>.

So when you execute: filter(distinctByKey(String::toString)) you're in fact calling the distinctByKey method first, which then returns a Predicate. That predicate then gets executed for every element. Just the factory function will only be executed once.

When moving the System.out.println inside the returned lambda you'll get the desired print statements:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    System.out.println("inside distinctByKey method...");
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> {
        System.out.println("inside distinctByKey.lambda method... ");
        return seen.add(keyExtractor.apply(t));
    };
}
Lino
  • 19,604
  • 6
  • 47
  • 65
4

That seen is captured by the lambda expression and cached inside the lambda, once you return the Predicate - the Predicate::test will be called multiple times with the same instance of seen

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 1
    Now this answer adds to your answer, @Lino. The concept of *capturing* a local variable inside a lambda well deserves a new answer. This same technique can be applied to implement different functional programming concepts, i.e. memoization. +1 – fps Aug 22 '18 at 12:52
  • @FedericoPeraltaSchaffner should we also add that it's something one *has to get used to*? It's not trivial at all understanding this when you first see it – Eugene Aug 22 '18 at 12:53
  • Yes, that's so true. It's not intuitive, when I first saw this technique I also thought that my method was going to be called for every element of the stream. But once you understand the difference between *imperative* and *declarative*, you see the light. – fps Aug 22 '18 at 12:57
  • 1
    If one understands the concept of closures, e.g. from javascript, then this may be more familiar/intuitive – Lino Aug 22 '18 at 13:01