How to Optimize the code after creating map

Question

I want to make the second part work thanks for the help:)

If arraylist is ['A','a','B','C','C'] then dups:['A','a','C','C'] nondups: ['B'] I tried :

Map<String, Long> counts = nums.parallelStream()
  .collect( Collectors.groupingBy( {k -> k.toLowerCase()}, Collectors.counting()) )

It gives counts: {a:2, b:1, c:2}

Now I am finding dups and nondups which is done by below code or alternate suggestion to solve this:

 Map<Boolean, List<String>> hasDup =
            counts.entrySet().parallelStream()
              .collect(Collectors.partitioningBy(
                 entry -> entry.getValue() > 1,
                 Collectors.mapping(Map.Entry::getKey, Collectors.toList())));
List<String> dup = hasDup.get(true);
     List<String> nodup = hasDup.get(false);

dup:['a','c'] nondups:['b'] I need help in second part because the expected output is not the same as I want which is

dups:['A','a','C','C'] nondups: ['B']

score 1 · Answer 1 · answered Apr 03 '22 at 04:00

1

It seems you're mixing things, because the algorithm does not returns what you want even if it was a non-parallel stream. In the second part you are obtaining the key you've used to group due to duplication counting, which also is in lowercase. So, you won't retrieve the uppercase ones.

Then, what you really need is to "unroll" the dups as many times as the associated count.

List<Character> nums = List.of('A' , 'a' , 'B' , 'C' , 'C');
Map<String, Long> counts = nums.parallelStream()
    .map(String::valueOf)
    .collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));

List<Character> dup = new ArrayList<>();
List<Character> nodup = new ArrayList<>();
nums.forEach(x -> (counts.get(x.toString().toLowerCase()) > 1 ? dup : nodup).add(x));

Then, since it iterates over nums to fill both dup and nodup lists, you have the original Characters.

answered Apr 03 '22 at 04:00

Leo Tapia

131
6

thanks for the code Is there any way to parallelize nums.forEach(x -> (counts.get(x.toString().toLowerCase()) > 1 ? dup : nodup).add(x)); – checked Apr 03 '22 at 04:36
Because if data is large we can use multiple threads to divide the dups and nondups. Thanks for the help:) – checked Apr 03 '22 at 04:37
AFAIK, you can't use parallel streams for that. You would need to manually partition input streams, process each in a separate thread and then merge the resulting maps. And it is unclear if you would get significant speedup by doing that. I suspect you would do better looking for a non-streams solution to this. – Stephen C Apr 03 '22 at 05:16

Stephen C · Answer 2 · 2022-04-05T07:44:35.127

As Leo notes, the first part does not benefit from parallelism since the work nearly all happens in the collector. The same probably applies to the second part as well.

.... or alternate suggestion to solve this:

I think that the you will get significantly better performance if you don't use streams. Instead partition the input manually and give the partitions to N threads to populate a shared ConcurrentHashMap with key -> count entries.

Also, I think you need to rethink the way you are representing the outputs from this process. It strokes me that generating lists that contains multiple copies of the duplicates is unnecessary and inefficient. Even separating the duplicates from the non-duplicates seems unnecessary. Instead of generating lists, see if you can change the downstream code to stream data from the Map directly. Or push the data into concurrent queues for multiple downstream worker threads to consume.

Observation: in order to effectively exploit parallelism in a solution to a problem, it is often necessary to change the problem.

How to Optimize the code after creating map

2 Answers2