22

I have a series of double values which I want to sum up and get the maximum value. The DoubleStream.summaryStatistics() sounds perfect for that. The getSum() method has an API note reminding me of what I learned during one of my computer science courses: the stability of the summation problem tends to be better if the values are sorted by their absolute values. However, DoubleStream does not let me specify the comparator to use, it will just use Double.compareTo if I call sorted() on the stream.

Thus I gathered the values into a final Stream.Builder<Double> values = Stream.builder(); and call

values.build()
    .sorted(Comparator.comparingDouble(Math::abs))
    .mapToDouble(a -> a).summaryStatistics();

Yet, this looks somewhat lengthy and I would have preferred to use the DoubleStream.Builder instead of the generic builder. Did I miss something or do I really have to use the boxed version of the stream just to be able to specify the comparator?

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
muued
  • 1,666
  • 13
  • 25
  • 1
    Do you have a `DoubleStream` given to you by using an external API or are you constructing it by yourself? – Smutje May 18 '15 at 09:34
  • I am constructing it by myself. I could also store them in a double[]. Just used the Stream for the summaryStatistics. – muued May 18 '15 at 09:39

2 Answers2

14

Primitive streams don't have an overloaded sorted method and will get sorted in natural order. But to go back to your underlying problem, there are ways to improve the accuracy of the sum that don't involve sorting the data first.

One such algorithm is the Kahan summation algorithm which happens to be used by the OpenJDK/Oracle JDK internally.

This is admittedly an implementation detail so the usual caveats apply (non-OpenJDK/Oracle JDKs or future OpenJDK JDKs may take alternative approaches etc.)

See also this post: In which order should floats be added to get the most precise result?

Community
  • 1
  • 1
assylias
  • 321,522
  • 82
  • 660
  • 783
  • So your suggestion is to just use the `sum` function and let the JDK do the tricky part? This would mean ignoring the API note and raise the question why they put it there in the first place. (Implementing the Kahan summation algorithm for my problem sounds like reinventing the wheel) – muued May 18 '15 at 10:10
  • 1
    @muued That's what I do, because I know that (a) my users don't use alternative JDKs and (b) I am fairly confident that OpenJDK would not move to a "worse" algorithm in the future. But there is no guarantee so it's your call really! If you are not comfortable using non documented features you can adapt the (open source, but GPL 2, which may not be suitable for you) code of DoubleSummaryStatistics in your own class and call it with: `doubleStream.collect(YourStats::new, YourStats::accept, YourStats::combine);` – assylias May 18 '15 at 10:19
  • 3
    If you're really paranoid, you can also test the implementation on application startup. Though that's a very non-Java way of going about things. :) – biziclop May 18 '15 at 10:28
11

The only possible way to sort DoubleStream is to box/unbox it:

double[] input = //...
DoubleStream.of(input).boxed()
    .sorted(Comparator.comparingDouble(Math::abs))
    .mapToDouble(a -> a).summaryStatistics();

However as Kahan summation is used internally, the difference should be not very significant. In most of applications unsorted input will yield the good resulting accuracy. Of course you should test by yourself if the unsorted summation is satisfactory for your particular task.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334