53

Please, consider this code:

System.out.println("#1");
Stream.of(0, 1, 2, 3)
        .peek(e -> System.out.println(e))
        .sorted()
        .findFirst();

System.out.println("\n#2");
IntStream.range(0, 4)
        .peek(e -> System.out.println(e))
        .sorted()
        .findFirst();

The output will be:

#1
0
1
2
3

#2
0

Could anyone explain, why output of two streams are different?

Pavel_K
  • 10,748
  • 13
  • 73
  • 186
  • 12
    Interesting optimisation! I suppose `.sorted()` on that `IntStream` just noops because it already knows somehow it's already sorted. – sp00m May 24 '21 at 09:28
  • You might also want to check https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#StreamOps on how streams are lazy evaluated, and not just executed right away as soon as possible. – Progman May 24 '21 at 09:29
  • 12
    Generally, whether or not `peek` will actually observe a given element is unpredictable and implementation-dependent. It's only meant to be helpful for debugging purposes. – kaya3 May 24 '21 at 09:34
  • 2
    In addition the difference you are asking about and which is answered in the answer, another difference is that `Stream.of(0, 1, 2, 3)` gives you a `Stream` whereas `IntStream.range(0, 4)` gives you an `IntStream`, a stream of `int` primitives. – Ole V.V. May 24 '21 at 09:52
  • @kaya3 Isn't the issue just that the stream is lazy and only the first element is evaluated like Progman mentioned? It would be weird for `peek` to randomly decide to not work or to be defined to only work sometimes. – Filipe Rodrigues May 24 '21 at 22:21
  • @FilipeRodrigues You and kaya are saying the same thing only with different words. – Ole V.V. May 25 '21 at 03:49
  • 2
    I'm saying that, but I'm also saying it's an implementation detail whether or not an element is consumed, except for the few stream methods which specify this. It would be within spec for `peek` to do lots of different things in many common scenarios. – kaya3 May 25 '21 at 06:08
  • 1
    @sp00m and yet its not able to perform the `max` or `min` in `O(1)` ... `IntStream.range(0, 4).peek(System.out::println).max(); // produces 0,1,2,3`. Why not use the same optimisation here? – Naman May 26 '21 at 02:46
  • @Naman Even though IntStream produces a stream in ascending order, generally a sorted stream could be in ascending or descending. In such case, we could still get findFirst in O(1) but for max/min either the left most or the right most could be the answer right? – Gautham M May 26 '21 at 05:21
  • @GauthamM I kind of realized after posting the comment that I have _stretched_ on the part by involving complexity in the discussion. To answer your question specifically, for a sorted stream even without knowing the order, the min/max would be a comparison of the first and the last value, so still `O(1)`. But again, I think I shouldn't have stretched on the complexity part in my previous comment. So that we can avoid discussing it in this context(which is irrelevant mostly). – Naman May 26 '21 at 13:22
  • @Naman Yes min/max also requires one step. But is it possible for a stream to get the nth/last element directly? – Gautham M May 26 '21 at 14:39

2 Answers2

52

Well, IntStream.range() returns a sequential ordered IntStream from startInclusive(inclusive) to endExclusive (exclusive) by an incremental step of 1, which means it's already sorted. Since it's already sorted, it makes sense that the following .sorted() intermediate operation does nothing. As a result, peek() is executed on just the first element (since the terminal operation only requires the first element).

On the other hand, the elements passed to Stream.of() are not necessarily sorted (and the of() method doesn't check if they are sorted). Therefore, .sorted() must traverse all the elements in order to produce a sorted stream, which allows the findFirst() terminal operation to return the first element of the sorted stream. As a result, peek is executed on all the elements, even though the terminal operation only needs the first element.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • 2
    2 things: 1. is this a compiler optimization? seems like you imply that. 2. Would be helpful if you elaborate on `sorted()` making a difference (for peek) by needing all elements... Good answer, btw. – ernest_k May 24 '21 at 09:57
  • 4
    @ernest_k 1. After thinking about it more, compiler optimization seems less likely. I think the `or at least does nothing` part is more likely to be the case. 2. what kind of elaboration do you think is missing? I wrote in the second paragraph why `sorted` usually needs to traverse all the elements. – Eran May 24 '21 at 10:11
  • 4
    @ernest_k it is a runtime optimization only, and a bit fragile too – Eugene May 24 '21 at 16:21
33

IntStream.range is already sorted:

// reports true
System.out.println(
       IntStream.range(0, 4)
                .spliterator()
                .hasCharacteristics(Spliterator.SORTED)
);

So when sorted() method on the Stream is hit, internally, it will become a NO-OP.

Otherwise, as you already see in your first example, all the elements have to be sorted, only then findFirst can tell who is "really the first one".

Just notice that this optimization only works for naturally sorted streams. For example:

// prints too much you say?
Stream.of(new User(30), new User(25), new User(34))
            .peek(x -> System.out.println("1 : before I call first sorted"))
            .sorted(Comparator.comparing(User::age))
            .peek(x -> System.out.println("2 : before I call second sorted"))
            .sorted(Comparator.comparing(User::age))
            .findFirst();

where (for brevity):

record User(int age) { }
Naman
  • 27,789
  • 26
  • 218
  • 353
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 1
    "Just notice that this optimization only works for naturally sorted streams" I suppose that makes sense, the boolean "SORTED" flag only tracks one type of 'sorted', and it'd be really complex to track other types (you'd need something like a `Map` to track them. – Alexander May 25 '21 at 13:53
  • Possibly relevant to the implementation detail, but if the characteristics of the stream were deterministic to perform a No-Op, could such optimisation not be used in `IntStream.range(0, 4).peek(System.out::println).max(); // outputs 0,1,2,3`? On a sorted list, performing a max and min should be `O(1)` right? – Naman May 25 '21 at 17:30
  • @Naman yup, I remember there was a talk about this a few times. It is just not implemented "yet". – Eugene May 25 '21 at 17:41