17

I have a question on the intermediate stages sequential state - are the operations from a stage applied to all the input stream (items) or are all the stages / operations applied to each stream item?

I'm aware the question might not be easy to understand, so I'll give an example. On the following stream processing:

List<String> strings = Arrays.asList("Are Java streams intermediate stages sequential?".split(" "));
strings.stream()
           .filter(word -> word.length() > 4)
           .peek(word -> System.out.println("f: " + word))
           .map(word -> word.length())
           .peek(length -> System.out.println("m: " + length))
           .forEach(length -> System.out.println("-> " + length + "\n"));

My expectation for this code is that it will output:

f: streams
f: intermediate
f: stages
f: sequential?

m: 7
m: 12
m: 6
m: 11

-> 7
-> 12
-> 6
-> 11

Instead, the output is:

f: streams
m: 7
-> 7

f: intermediate
m: 12
-> 12

f: stages
m: 6
-> 6

f: sequential?
m: 11
-> 11

Are the items just displayed for all the stages, due to the console output? Or are they also processed for all the stages, one at a time?

I can further detail the question, if it's not clear enough.

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Bogdan Solga
  • 167
  • 1
  • 7

4 Answers4

25

This behaviour enables optimisation of the code. If each intermediate operation were to process all elements of a stream before proceeding to the next intermediate operation then there would be no chance of optimisation.

So to answer your question, each element moves along the stream pipeline vertically one at a time (except for some stateful operations discussed later), therefore enabling optimisation where possible.

Explanation

Given the example you've provided, each element will move along the stream pipeline vertically one by one as there is no stateful operation included.

Another example, say you were looking for the first String whose length is greater than 4, processing all the elements prior to providing the result is unnecessary and time-consuming.

Consider this simple illustration:

List<String> stringsList = Arrays.asList("1","12","123","1234","12345","123456","1234567");
int result = stringsList.stream()
                        .filter(s -> s.length() > 4)
                        .mapToInt(Integer::valueOf)
                        .findFirst().orElse(0);

The filter intermediate operation above will not find all the elements whose length is greater than 4 and return a new stream of them but rather what happens is as soon as we find the first element whose length is greater than 4, that element goes through to the .mapToInt which then findFirst says "I've found the first element" and execution stops there. Therefore the result will be 12345.

Behaviour of stateful and stateless intermediate operations

Note that when a stateful intermediate operation as such of sorted is included in a stream pipeline then that specific operation will traverse the entire stream. If you think about it, this makes complete sense as in order to sort elements you'll need to see all the elements to determine which elements come first in the sort order.

The distinct intermediate operation is also a stateful operation, however, as @Holger has mentioned unlike sorted, it does not require traversing the entire stream as each distinct element can get passed down the pipeline immediately and may fulfil a short-circuiting condition.

stateless intermediate operations such as filter , map etc do not have to traverse the entire stream and can freely process one element at a time vertically as mentioned above.

Lastly, but not least it's also important to note that, when the terminal operation is a short-circuiting operation the terminal-short-circuiting methods can finish before traversing all the elements of the underlying stream.

reading: Java 8 stream tutorial

Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
  • Thanks a lot for the details, Aominè! The optimization certainly makes sense; I'm surprised I haven't seen any references to it, until now. Do you have any links / references on where I could read more? – Bogdan Solga Jul 01 '17 at 16:03
  • @BogdanSolga I have appended a good article about it. the description under _Processing Order_ describes in detail what I've mentioned within my answer. – Ousmane D. Jul 01 '17 at 16:11
  • hi, there is no short-circuiting operation in the OP's question. why did you say short-circuiting? it is *filtering*. – holi-java Jul 01 '17 at 18:15
  • @holi-java you're correct, I guess I started to speak about the code example I've provided before even showing it. will edit though. Thanks for the suggestion. – Ousmane D. Jul 01 '17 at 18:17
  • "The example above will not process all the elements greater than 4 " - the filtering behavior is inverted here. it should be process. – holi-java Jul 01 '17 at 18:22
  • @holi-java the wording might not be the best but what I meant is only the first `s.length() > 4` will move through to the `mapToInt` rather than all the ones greater than `4`. but ofcourse the `filter` will process the elements from `"1"` to `"12345"`. – Ousmane D. Jul 01 '17 at 18:26
  • 1
    first, up-vote. maybe you need to say when the terminal operation is a short-circuiting operation, the `filter` operation will be exit until the first element was satisfied. – holi-java Jul 01 '17 at 18:36
  • @holi-java Sure, I will mention that as well. ^^ – Ousmane D. Jul 01 '17 at 18:44
  • 1
    @BogdanSolga: can you please accept the answer ? It will help someone who comes across this question. – Adithya Dec 29 '17 at 16:24
  • 1
    When you are revising the answer anyway, note that `distinct()` is a stateful intermediate operation as it has to remember all previously encountered distinct elements, but unlike `sort`, it does *not* require traversing the entire stream as each distinct element can get passed down the pipeline immediately and may fulfill a short-circuiting condition. – Holger Jan 16 '18 at 09:09
  • @Holger that's very interesting, does this happen at the moment in 8 or 9? IIRC it does traverse the entire stream, even if there is short-circuting terminal oepration – Eugene Jan 16 '18 at 09:24
  • @Eugene just tested with Java 8, e.g. `IntStream.range(0, 10).peek(System.out::println) .map(i->i>>1).distinct().anyMatch(i->i>1);` does not process numbers beyond `4`. Note that when you use `distinct()` with streams that are ordered *and* parallel, there is a high chance that workers process lots of excess elements up to the entire stream, as a side effect of maintaining the ordering constraint. – Holger Jan 16 '18 at 09:32
  • @Holger I did a test too `Stream.of(1, 2, 3, 3, 3, 4) .peek(System.out::println) .distinct() .peek(x -> System.out.println("After " + x)) .findFirst();` and indeed `distinct` does not process the entire Stream... thank you – Eugene Jan 16 '18 at 09:34
  • @Holger as always very much appreciated for your help. thanks Eugene too ;-) – Ousmane D. Jan 16 '18 at 10:53
4

Your answer is loop fusion. What we see is that the four intermediate operations filter() – peek() – map() – peek() – println using forEach() which is a kinda terminal operation have been logically joined together to constitute a single pass. They are executed in order for each of the individual element. This joining together of operations in a single pass is an optimization technique known as loop fusion.

More for reading: Source

2

An intermediate operation is always lazily executed. That is to say they are not run until the point a terminal operation is reached. A few of the most popular intermediate operations used in a stream

filter – the filter operation returns a stream of elements that
satisfy the predicate passed in as a parameter to the operation. The
elements themselves before and after the filter will have the same
type, however the number of elements will likely change


map – the map operation returns a stream of elements after they have
been processed by the function passed in as a parameter. The
elements before and after the mapping may have a different type, but
there will be the same total number of elements.

distinct – the distinct operation is a special case of the filter
operation. Distinct returns a stream of elements such that each
element is unique in the stream, based on the equals method of the
elements

.java-8-streams-cheat-sheet

jithin joseph
  • 176
  • 1
  • 6
2

Apart from optimisation, the order of processing you'd describe wouldn't work for streams of indeterminate length, like this:

DoubleStream.generate(Math::random).filter(d -> d > 0.9).findFirst();

Admittedly this example doesn't make much sense in practice, but the point is that rather than backed by a fixed-size collection,DoubleStream.generate() creates a potentially infinite stream. The only way to process this is element by element.

biziclop
  • 48,926
  • 12
  • 77
  • 104