2

I have a list of TxnTypes with different complexities (process durations).
I want to find matched TxnType from the list.
I tried to implement it by mixing parallel processing and short-circuit filter features of the stream.
but I noticed there is not a mixture of them.
I wrote the below sample. But noticed a mix of parallel and short-circuit not work properly.
Every run shows parallel processing working but not terminate when found as soon as found item!!!

    class TxnType {
        public String id;   
        public TxnType(String id) {this.id = id;}
       
        public boolean match(String entry) {
            Date s = new Date();
            // simulate long processing match time TxnType
            if (id.equals("1")) {
                try {
                    Thread.sleep(4000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
            Date f = new Date();
            System.out.println("check id = " + id+ "  duration = "+(f.getTime()- s.getTime()));
    
            return id.equalsIgnoreCase(entry);
        }
    }

     private void test4() {
        // build list of available TxnTypes
        ArrayList<TxnType> lst = new ArrayList<>();
        lst.add(new TxnType("0"));
        lst.add(new TxnType("1"));  // long match processing time type
        lst.add(new TxnType("2"));
        lst.add(new TxnType("3"));
        lst.add(new TxnType("4"));

        String searchFor = "3";
        System.out.println("searchFor = " + searchFor);
        Date st, fi;

        st = new Date();
        Optional<TxnType> found2 =lst.stream().parallel().filter(txnType->txnType.match(searchFor)).findFirst();
            System.out.println("found.stream().count() = " + found2.stream().count());
            fi= new Date();
            System.out.println("dur="+ (fi.getTime()- st.getTime()));
    }

By running multiple times, I found that the processing was not terminated as soon as possible and wait to process all of them!!!!

searchFor = 3
check id = 4  duration = 0
check id = 2  duration = 0
check id = 3  duration = 0
check id = 0  duration = 0
check id = 1  duration = 4005
found.stream().count() = 1
dur=4050

Is there something like FilterFindFirst() ?

Gautham M
  • 4,816
  • 3
  • 15
  • 37
Moh Tarvirdi
  • 685
  • 1
  • 13
  • 25
  • Since the work would be split into available threads, each thread would have **started** processing the elements before the match was found. I would suggest you to run with a longer list of elements ( maybe 8 to 10 elements) to see the short circuiting in action. [more info](https://stackoverflow.com/a/67690928/7804477) – Gautham M Jun 26 '21 at 07:55
  • looks similar to [How to stop parallel stream once findAny match found](https://stackoverflow.com/questions/52411986/how-to-stop-parallel-stream-once-findany-match-found/52412412) – Vasyl Sarzhynskyi Jun 26 '21 at 08:08

3 Answers3

1

Your mistake is using findFirst, rather than findAny.

Note that 1 is ordered before the element that you expect to be found (3). So it has to finish checking 1 first, before it can conclude that "3 is first element that matches the predicate", even if they are done in parallel. If it found 3, and haven't started checking something further down the list yet, then it won't start that. This is what short-circuiting in findFirst means.

findAny on the other hand, doesn't care about the order. If it finds any element that satisfies the predicate, it does not start to check anything new anymore.

Now, even if you changed to findAny, you may still find that it takes 4 seconds to complete. This is because there are too few elements in the list, compared to how many threads the stream pipeline can create. So the processing for all the elements start, and once it has started, it will not be interrupted, even if it has already found an element that satisfies the predicate.

If you put more elements into the list:

for (int i = 0 ; i < 100 ; i++) {
    lst.add(new TxnType("foo"));
}

...

Optional<TxnType> found2 = lst.parallelStream().filter(txnType -> txnType.match(searchFor)).findAny();

Then the processing of 1 is less likely to be started before the processing of 3 finishes, and you will get a much quicker run. This will not happen every time though. There's no guarantee that 1 won't get processed before 3.

Basically, the short-circuiting is working correctly. It's just that

  • findFirst won't short-circuit as aggressively as you'd like
  • your list has too few elements, and your computer has enough cores to process them all at once, so it does.
Sweeper
  • 213,210
  • 22
  • 193
  • 313
1

If you look at the source code for FindOps.FindTask.doLeaf you could see that, first the operation is executed, only after this it is checked whether a result is found.

Assuming 4 cores, one element of the list would be passed to each thread. As soon as one thread completes execution for one element, it becomes free and picks the next available element if any. In your case, it is highly possible that the 5th element of the list was picked for execution before a match was found.

If you try with a longer list, you could see the short circuiting in action.

Gautham M
  • 4,816
  • 3
  • 15
  • 37
1

1 - a list with 5 elements is way too small, probably each Thread is processing just one element, at most 2

2 - since the second element is the one taking more time, it much probably will get started before any other check is finished, that is, before any element can be found - once the match() method is started, it will not be interrupted

Try with more elements: 50 or even 1001.
Change the long running element to be the 10th or later.
For fun also add Thread.currentThread().getId() to the output of match().

You will see that

  • the long running thread is not always started even if using findAny()2
  • not all elements are checked

3 - just a note: strange seeing Stream mixed with Date (I would prefer using java.time classes)


1 - Tested myself with 50 elements, "9" being the slow one, searching for "15" (or "3")
2 - I could not see any relevant difference (none if using only 5 elements) in using findAny() instead of findFirst() (against my first guess)
the order of checking is random with both, sometimes findAny() run the slow thread while findFirst() did not

user16320675
  • 135
  • 1
  • 3
  • 9