0

I am trying to use multi-threading to sort arrays that are stored in a map. There are a large amount of records, ~3.1 million, and thus while I am attempting to sort these records in a single threaded for loop it takes many hours to complete. I'm hoping to get this time down as much as possible, ideally within a few minutes (please don't laugh!).

Stacktrace:

    Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Comparison method violates its general contract!
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:562)
    at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:591)
    at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:689)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:765)
    at com.salesforce.process.Process.startProcess(Process.java:51)
    at com.salesforce.process.Schedule.main(Schedule.java:10)
Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract!
    at java.base/java.util.TimSort.mergeLo(TimSort.java:781)
    at java.base/java.util.TimSort.mergeAt(TimSort.java:518)
    at java.base/java.util.TimSort.mergeCollapse(TimSort.java:448)
    at java.base/java.util.TimSort.sort(TimSort.java:245)
    at java.base/java.util.Arrays.sort(Arrays.java:1307)
    at java.base/java.util.ArrayList.sort(ArrayList.java:1721)
    at com.salesforce.process.Process.lambda$startProcess$0(Process.java:51)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1779)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

Class Object:

public class MyObject {
private Integer id;
public String someString;
public Double sortableValue;

... contructors & getters and setters ...
public static Comparator<MyObject> SortableValueComparator = new Comparator<MyObject>() {

    public int compare(MyObject ds1, MyObject ds2) {
       Double sortableValue1 = ds1.getSortableValue();
       Double sortableValue2 = ds2.getSortableValue();
       //descending order      
       if (Double.compare(sortableValue1, sortableValue2) == 0) {
            return 0;
        }
        else if (Double.compare(sortableValue1, sortableValue2) < 0) {
            return -1;
        }
        else {
            return 1;
        }
    }
};

The Code:

And I'm trying to execute this in the code like this:

Map<String,List<MyObject>> map = new HashMap<String,List<MyObject>>();
// inject 3.1 million keys with List<MyObject> values, with 1-10 items in each list.

map.values().parallelStream().forEach(list -> list.sort(MyObject.SortableValueComparator));

NOTE: this is not what I want to do, but I initially wrote the code like this and it worked. That is to say, my comparator works if I do it this way.

for (List<MyObject> list : map.values()) {
            Collections.sort(list, MyObject.SortableValueComparator);    
        }

however, it takes for.ev.er to complete, which sadly is not acceptable for our business case. What can this noob do to make this parallelStream() or some way of threading this work?? If you need more info, please let me know! Thanks so much!!

Edit: I want to also give you guys a sample of the data below. So this is a Map<String,List<MyObject>>.

key (String): "key1", values (List<MyObject>): [{"a",0.0112},{"b",0.12},{"c",0.00512}]
key: "key2", values: [{"d",0.0922},{"a",0.0112},{"f",0.23}]
key: "key3", values: [{"z",0.141},{"w",0.432},{"x",0.0001}]

so, If I wanted to sort key3 list of objects, they would return like this:

key: "key3", values: [{"w",0.432},{"z",0.141},,{"x",0.0001}]

and, I want to do this sort function on every record.

Sam P
  • 11
  • 4
  • 1
    Why don't you just `return Double.compare(sortableValue1, sortableValue2);`? – shmosel Jan 03 '22 at 21:58
  • What is `getSortableValue` doing? Perhaps that is your bottleneck. – Basil Bourque Jan 03 '22 at 21:59
  • 3
    Or you could simplify even further: `Comparator SortableValueComparator = Comparator.comparingDouble(MyObject::getSortableValue));` – shmosel Jan 03 '22 at 21:59
  • 1
    You say “it works fine” yet you show an exception stack trace? I’m confused. – Basil Bourque Jan 03 '22 at 22:02
  • For the Comparator error use Comparator.comparingDouble. for parallel sorting look at the Arrays class, there is a parallelSort method. – Nathan Hughes Jan 03 '22 at 22:03
  • @NathanHughes I think he's trying to sort multiple lists in parallel, not parallel sort one list. – shmosel Jan 03 '22 at 22:40
  • @shmosel thanks for your reply! I have tried that as well, and I get an error still even using `return Double.compare(clickRate2, clickRate1);`. I will try your suggestion of trying to simplify further!! – Sam P Jan 03 '22 at 22:56
  • @BasilBourque sorry, edited my post to make it clearer. A single threaded for loop works fine, but I want to sort multiple lists in parallel. – Sam P Jan 03 '22 at 22:57
  • @NathanHughes, as shmosel has said I am trying to sort multiple Lists in parallel. Thank you shmosel. – Sam P Jan 03 '22 at 22:57
  • @BasilBourque getSortableValue is getting the double value from the object to compare. I want to sort the list descending, using the sortableValue. It should not be too intense as the object has already been passed to the comparator, but I would take any advice at this point. :) – Sam P Jan 03 '22 at 23:11
  • (A) Is `getSortableValue` a simple getter method? (B) Is that method returning a `Double` object or a `double ` primitive? Avoiding unnecessary auto-boxing would help performance. – Basil Bourque Jan 03 '22 at 23:27
  • @BasilBourque in both cases I am using the `Double` object. I did not know that about autoboxing, so I appreciate that nugget! – Sam P Jan 03 '22 at 23:39
  • @user16320675 no sir, nothing outside of what you see in the code there. – Sam P Jan 03 '22 at 23:41
  • @user16320675 I'm happy to make edits if there is something specific you need to see, from my limited understanding, this should be all the info needed? I am also 'generalizing' the code so it's not exactly what I have. – Sam P Jan 03 '22 at 23:47
  • What looks strange is that `TimSort.mergeLo()` / `TimSort.mergeHi()` (where the exception is thrown) is only ever executed when the size of the list to be sorted is greater or equal to `TimSort.MIN_MERGE` (which is 32). If all of your `List` have sizes from 1 to 10 then that code should never be executed... – Thomas Kläger Jan 04 '22 at 07:04
  • All thanks for your help!! I am working on this this week, and hopefully can come back with some updates. @ThomasKläger I didn't notice that, and as I've gone back and worked on suggestions from this thread I discovered that I am over-building `List` for MANY of the lists, and there are huge amounts of duplicates! My next step is to address that problem, but your insight here is actually very helpful and I will keep an eye out for `List` > 32 after I have done that. – Sam P Jan 04 '22 at 18:58
  • Hey all, it works. simply needed to use `stream()` instead of `parallelStream()`. I appreciate all your effort to help me troubleshoot this as I'm learning how to do more software development. Cheers. – Sam P Jan 11 '22 at 16:30

2 Answers2

2

best to put a breakpoint where the exception is thrown and check the values being compared. Then write a unit test that checks what happens when those values are passed in to the comparator + how the result compares with 'equals' on the same two objects. It is very likely that your comparator is returning a 0 value for objects that are not also 'equal' - i.e. the implementation of 'equals' on MyObject compares something other than the sortableValue. This causes problems when merging collections.

So, set a breakpoint, see what values are breaking the contract, capture it in a test or two. Once you've figured it out, it might be that you'll have to add some additional fields (if you don't have control of 'equals' or this is existing code you mustn't change that way) to your comparator to make the 'equals' match.

  • Hi @Chris Ebert I appreciate the advice on how to troubleshoot a comparator. I definitely was wondering how to do that, so I do appreciate it. I will see if I can figure out what you're saying and try it out, this seems like good general advice. – Sam P Jan 03 '22 at 23:08
0

Instead of using

Map.values().parallelStream().forEach(list -> list.sort(comparator))

I used

Map.values().Stream().forEach(list -> list.sort(comparator))

and it worked!

Sam P
  • 11
  • 4