0

I am looing to ensure order in parallel streams based on some field value. Maybe I will introduce some abstract example:

Let's say we have class User

@Lombok.Data()
class User {
   private String firstName;
   private String lastName;
   private int someValue;
   private int priority;
}

and we have list of these Users:

List<User> users = someInitUsersFunction();

I want to force parellel stream to process every user per priority, let's say we have 100 users with 0 priority, 100 users with 1 priority and 100 users with 2 priority.

I want to start process users with priority 2 only when priority 1 is done and when priority 0 is done.

I think

mvn install -T 4

might be the approach I am looking for (the first build independent modules). Is it possible to do this in java streams? Also use alternatives is possible.

My approach is to divide to specific list by priority then process list by list

Naman
  • 27,789
  • 26
  • 218
  • 353
Kerdac
  • 47
  • 3
  • 1
    Why do you want to control the processing order when the operation is only completed when all elements have been processed anyway? – Holger Apr 16 '20 at 13:09

3 Answers3

1

To process the users in blocks, by priority, but process users of the same priority in parallel, first group the users by priority, then process each group separately.

users.stream().collect(Collectors.groupingBy(User::getPriority, TreeMap::new, Collectors.toList()))
        .values().stream()
        .forEachOrdered(list -> // sequential, in priority order
            list.parallelStream().forEach(user -> { // parallel, unordered
                // process user here
            }));

Without nested streams, and commented for clarity:

// Group users by priority
TreeMap<Integer, List<User>> usersByPriority = users.stream()
        .collect(Collectors.groupingBy(User::getPriority, TreeMap::new, Collectors.toList()));

// Process groups in priority order
for (List<User> list : usersByPriority.values()) {

    // Process users of current priority in parallel
    list.parallelStream().forEach(user -> {
        // process user here
    });

    // We won't loop to next priority until all users with current priority has been processed
}
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • @CodeScale You can replace the `forEachOrdered()` with a regular enhanced `for` loop, if you don't want nested streams. – Andreas Apr 16 '20 at 12:40
  • the problem is not the functional or imperative way to do this but verbosity of chaining code. I would say that a more appropriate approach is to extract the second loop/stream in another method (extract method refactoring). This will improve readability of the above stream. – CodeScale Apr 16 '20 at 12:45
0

Sequential vs parallel is not the same as ordering.

If you have an ordered stream and doing some operations which guarantee to maintain the order, it is not important whether the stream is processed in parallel or sequential; the implementation must keep the order.

In your case(only if your list doesn't respect the order you want..) you could use a specific Comparator or let User implements Comparable based in priorityfield. Then sort your list before starting performing other stream operations. Parallel or sequential won't give different result

Or using specific collection-types like SortedSetor PriorityQueue

Some attention about PriorityQueue The Iterator provided in method iterator() is not guaranteed to traverse the elements of the PriorityBlockingQueue in any particular order.`

So you have to sort element during streaming to keep order--> stream().sorted() or simply use poll method on it.

CodeScale
  • 3,046
  • 1
  • 12
  • 20
0

If you want to process the first 100 ones with 1 priority, then the latter 100 and so on, it is neither parallel nor concurrent but actually sequence. These partial sub-lists can be processed in parallel. A PriotityQueue or SortedMap are ways to go.

SortedMap:

Use the TreeSet implementation inside the Collectors.groupingBy method:

Map<Integer, List<User>> map = users.stream()
    .collect(Collectors.groupingBy(
         User::getPriority,
         TreeMap::new,
         Collectors.toList()));

The map is sorted by the priority (key).

PriorityQueue:

  • Group by User::getPriority to Map<Integer, List<User>> to group sub-lists by priority
  • Add the to the PriorityQueue comparing by the priority
  • Process in parallel the lists polled sequentially

Start with the grouping:

Map<Integer, List<User>> map = users.stream().collect(Collectors.groupingBy(User::getPriority);

At this moment, the map would look like:

[User(firstName=C, lastName=C, someValue=0, priority=1)]
[User(firstName=A, lastName=A, someValue=0, priority=2)]
[User(firstName=B, lastName=B, someValue=0, priority=3), User(firstName=D, lastName=D, someValue=0, priority=3)]

Create PriorityQueue from the map:

 Queue<List<User>> queue = map.entrySet()
        .stream()
        .collect(
            () -> new PriorityQueue<>(Comparator.comparingInt(list -> list.get(0).getPriority())),
            (pq, entry) -> pq.add(entry.getValue()),
            AbstractQueue::addAll);

Iterating through the queue respects the priority and the subset might be processed in parallel since they have the same priority.

for (List<User> users : queue) {
    users.stream().parallel()...
}
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183