0

Last time I was discovering nooks of the functional programming of Java 8 and above and I found out a static method mapping in Collectors class.

We have a class Employee like:

@AllArgsConstructor
@Builder
@Getter
public class Employee {
  private String name;
  private Integer age;
  private Double salary;
}

Let's say that we have a POJO list of Employee class and we want to receive a list of all names of Employees. We have two approaches likes:

    List<Employee> employeeList
        = Arrays.asList(new Employee("Tom Jones", 45, 15000.00),
        new Employee("Harry Andrews", 45, 7000.00),
        new Employee("Ethan Hardy", 65, 8000.00),
        new Employee("Nancy Smith", 22, 10000.00),
        new Employee("Deborah Sprightly", 29, 9000.00));

    //IntelliJ suggest replacing the first approach with ```map``` and ```collect```

    List<String> collect =
        employeeList
        .stream()
        .collect(
            Collectors.mapping(Employee::getName, Collectors.toList()));

    List<String> collect1 =
        employeeList
            .stream()
            .map(Employee::getName)
            .collect(Collectors.toList());

I know that the first approach uses a terminal operation on Stream and the second one intermediate operation on Stream but I want to know if the first approach will have worse performance than second and vice-versa. I would be grateful if you could explain the potential performance degradation for the first case when our data source (employeeList) will significantly increase in size.

EDIT:

I created a simple two test cases which were supplied by records generated in a simple for loop. Accordingly for small data input the difference between ,,traditional'' approach with Stream.map usage and Collectors.mapping is marginal. On the other hand in a scenario when we are intensively increasing the number of data like 30000000 surprisingly Collectors.mapping starts working a little bit better. So as not to be empty-handed for data input 30000000 Collectors.mapping lasts 56 seconds for 10 iterations as @RepeatedTest and with the same data input for the same iteration more recognizable approach like Stream.map and then collect last 5 second longer. I know that my provisional tests are not the best and it cannot illustrate reality due to JVM optimization but we can claim that for huge data input Collectors.mapping can be more desirable. Anyway, I think that this

Martin
  • 1,139
  • 4
  • 23
  • 49
  • Look at [this post](https://stackoverflow.com/questions/58334705/ways-to-map-the-list-of-objects-to-another-list-of-objects/58334853#58334853), its a similar question. Lots of varieties, empirical data and a link to further posts of the same kind. With the resulting baseline argument, that there are differences, but they don't matter. – Curiosa Globunznik Oct 15 '19 at 07:26
  • @Naman I have seen this post but this question answered a question why we need ```mapping``` and similar methods which are simplifying collect operation on our stream. – Martin Oct 15 '19 at 07:45
  • I'll quote from Andy's answer *I doubt there is a meaningful performance difference. **You'd have to benchmark it on your data to know for sure.*** – Naman Oct 15 '19 at 07:47

1 Answers1

1

I doubt there is a meaningful performance difference. You'd have to benchmark it on your data to know for sure.

Note that mapping isn't actually intended to be used directly as a collector, but rather as a downstream collector within another collector:

The mapping() collectors are most useful when used in a multi-level reduction, such as downstream of a groupingBy or partitioningBy.

There is something in Effective Java 3rd Edition about this too (in Item 46, about 2/3 of the way down page 214, the paragraph starting "The collectors returned by the counting method"). Basically, it says not to use things like mapping in the first way you do here.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243