6

I expected that simple intermediate stream operations, such as limit(), have very little overhead. But the difference in throughput between these examples is actually significant:

final long MAX = 5_000_000_000L;

LongStream.rangeClosed(0, MAX)
          .count();
// throughput: 1.7 bn values/second


LongStream.rangeClosed(0, MAX)
          .limit(MAX)
          .count();
// throughput: 780m values/second

LongStream.rangeClosed(0, MAX)
          .limit(MAX)
          .limit(MAX)
          .count();
// throughput: 130m values/second

LongStream.rangeClosed(0, MAX)
          .limit(MAX)
          .limit(MAX)
          .limit(MAX)
          .count();
// throughput: 65m values/second

I am curious: What is the reason for the quickly degrading throughput? Is it a consistent pattern with chained stream operations or my test setup? (I did not use JMH so far, just set up a quick experiment with a stopwatch)

ernest_k
  • 44,416
  • 5
  • 53
  • 99

2 Answers2

4

limit will result in a slice being made of the stream, with a split iterator (for parallel operation). In one word: inefficient. A large overhead for a no-op here. And that two consecutive limit calls result in two slices is a shame.

You should take a look at the implementation of IntStream.limit.

As Streams are still relative new, optimization should come last; when production code exists. Doing limit 3 times seems a bit far-fetched.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • So this is just a question of sub-optimal implementation...? Is there any workaround? We have to admit that the overhead is impressive (as impressive as `limit (x3)` is far-fetched or artificial) – ernest_k Oct 04 '18 at 13:01
  • Chained stream expression tend to be expressive without much overhead, so it is a matter of smart data structures and algorithms. As soon as you see something more complex, with grouping for instance, there might be simpler data structures to avoid such structural processing. So _no_, there is no special medicin. – Joop Eggen Oct 04 '18 at 13:09
  • @JoopEggen the only medicine is in the Stream implementation itself, I guess – Eugene Oct 04 '18 at 14:18
  • @Eugene I have seen Stream code with grouping / a temporary list / flatMap, that just as well could be done by a much simpler, more direct Stream snippet using references to variables (AtomicInteger). Grouping itself is nice, not bad, but Streams are not as declarative as SQL or functional languages, where under the hood strong optimizations may happen. – Joop Eggen Oct 04 '18 at 14:25
  • @JoopEggen I also tried this with `sequential()` in front, following the [Java SE](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#limit-long-) guidance to enforce the cheapest execution path - no effect though.Limiting 3 times is probably far-fetched. Two calls might be realistic if the stream operations are scattered across different application layers and there are map/filter operations in the middle. In the end I was impressed with the significant drop in performance and curious about the cause. –  Oct 04 '18 at 18:17
  • 1
    @JoopEggen this is not an issue of the API design. We have an alternative implementation in use (a Java 7 compatible back-port), which does not suffer from these issues at all. There, applying `limit` directly after `rangeClosed` has the same effect as if you used the adapted range in the first place. Likewise, multiple subsequent `limit` operations would work as if you did one `limit` with the smallest number only. However, I still don’t have the feeling that this difference has much relevance in practice. What matters more, is avoiding these overly long call chains (avoid the inlining limit) – Holger Oct 05 '18 at 07:29
4

This is an under implementation in the Stream API (don't know how to call it otherwise).

In the first example, you know the count without actually counting - there are no filter (for example) operations that might clear the internal flag called SIZED. It's actually a bit interesting if you change this and inspect:

System.out.println(
            LongStream.rangeClosed(0, Long.MAX_VALUE)
                    .spliterator()
                    .hasCharacteristics(Spliterator.SIZED)); // reports false

System.out.println(
            LongStream.rangeClosed(0, Long.MAX_VALUE - 1) // -1 here
                    .spliterator()
                    .hasCharacteristics(Spliterator.SIZED)); // reports true

And limit - even if there are no fundamental (AFAIK) limitations, does not introduce SIZED flag:

System.out.println(LongStream.rangeClosed(0, MAX)
            .limit(MAX)
            .spliterator()
            .hasCharacteristics(Spliterator.SIZED)); // reports false

Since you count everywhere, the fact that internally the Stream API does not know if stream is SIZED, it just counts; while if the Stream is SIZED - reporting count would be well, instant.

When you add limit a few times, you are just making it worse, since it has to limit those limits, every single time.

Things have improved in java-9 for example, for the case:

System.out.println(LongStream.rangeClosed(0, MAX)
            .map(x -> {
                System.out.println(x);
                return x;
            })
            .count());

In this case map is not computed at all, since there is no need for it to - no intermediate operation changes the size of the stream.

Theoretically a Stream API might see that you are limiting and 1) introduce the SIZED flag 2) see that you have multiple calls of limit and just probably take the last one. At the moment this is not done, but this has a very limited scope, how many people would abuse limit this way? So don't expect any improvements on this part soon.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • An interesting detail and good to know that things are optimized with later Java releases. But the real cause should be somewhere else since `limit()` (at least in theory) is an extremely cheap operation and calling it 3 times should not cause a drop of performance in the range of 1-2 orders of magnitude. –  Oct 04 '18 at 18:43
  • @Matthias why do you think the would be an extremely cheap operation? As far as I can see from the implementation it checks the `limit`, every time, before handing that element further. that would be a lot of checks; but also it might very well be the way you testing this – Eugene Oct 04 '18 at 19:04
  • Because the [Documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#limit-long-) hinted in that direction plus I let myself deceive by the more-than-linear drop in thoughput (and also was hoping for CPU optimizations to kick in on the extremely predictable execution pattern during the looping/iteration). After a brief test with dedicated iterator-based limiting I see similar results as with streams. So my initial assumptions were obviously unrealistic. –  Oct 04 '18 at 20:46
  • @Matthias it would be very cool if you actually showed your tests, I assume JMH based? – Eugene Oct 04 '18 at 21:09