17

I would like to know what is faster: to filter a custom object by field and then map by its field or vice-versa (map and then filter).
At the end, I usually want to collect the mapped field into some Collection.

For example, the simplest Person class:

public class Person {
    String uuid;
    String name;
    String secondName;
}

Now let's have a List<Person> persons.

List<String> filtered1 = persons
                .stream()
                .filter(p -> "NEED_TOY".equals(p.getName()))
                .map(Person::getName)
                .collect(Collectors.toList());
// or?
List<String> filtered2 = persons
                .stream()
                .map(Person::getName)
                .filter(p -> "NEED_TOY".equals(p))
                .collect(Collectors.toList());
Boann
  • 48,794
  • 16
  • 117
  • 146
keyzj
  • 321
  • 3
  • 13
  • 1
    Have you tried profiling both approaches? Just get the time with `System.nanoTime()` before and after execution and log the difference. I'm not sure how "smart" the JVM can optimize streams but my guess is that `filtered1` is slightly faster because it has to map through a smaller array than the original. – Auskennfuchs Aug 17 '19 at 12:16
  • @Auskennfuchs - That isn't how streams work. But it's still true that with streams, filtering can result in fewer mapping calls, it just doesn't literally produce a filtered thingy and then go through it again to map. As you say, though, profiling the actual code (if there's some perf problem that has to be solved) is indeed the right thing to do. – T.J. Crowder Aug 17 '19 at 12:32

2 Answers2

17

In this specific example, where calling Person.getName() has basically no cost at all, it doesn't matter, and you should use what you find the most readable (and filtering after could even be marginally faster, since as TJ mentions, the mapping operation is part of the filtering operation).

If the mapping operation has a significant cost however, then filtering first (if possible) is more efficient, since the stream won't have to map the elements that have been filtered out.

Let's take a contrived example: you have a stream of IDs, and for every even ID in the stream, you have to execute an http GET request or a database query to get the details of the item identified by this ID (and thus mapping the ID to a detailed object).

Assuming that the stream is composed of half even and half odd IDs, and each request takes the same time, you would divide the time by two by filtering first. If every http request takes 1 second and you have 60 IDs, you would go from 60 seconds to 30 seconds for the same task by filtering first, and you would also reduce the charge on the network and the external http API.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • 1
    Not only in the OP’s example is the mapping operation part of the filter predicate. That applies to *all* scenarios where this question makes sense. When the predicate operates on the result of the mapping operation, it is straight-forward to perform the mapping operation first, which is more readable, reducing code duplication *and* avoiding redundant work. But when the filter does not operate on the result of the mapping function, the question, where to place it, does not arise, as the required input mandates the placement and you can’t swap them anyway. – Holger Aug 19 '19 at 08:33
6

Apparently the performance totally depends on

  • how complex operations you performs while streaming (your business logic)
  • how complex your data is

Lets take two simple scenarios

Scenario 1

If your map function needs to performs some complex operation such as calling some external REST api to manipulate the stream objects, then in this scenario I recommend to filter first before map since it will reduce the no of unwanted expensive REST calls. In this approach when we do filter first, apparently it is performing the mapping operation twice for all matching objects.

enter image description here

Scenario 2

Assume that you need to manipulate the data stream first based on some external REST API calls or functions and then filter on that results. Apparently in this scenario you need to map first before filter the stream. This approach can be slightly faster compared to the previous one, since mapping operation is part of the filtering operation

enter image description here

Community
  • 1
  • 1
Nidhish Krishnan
  • 20,593
  • 6
  • 63
  • 76
  • You missed the point that the same operation performed by `map`, is also performed by `filter`. So doing `filter` first, is performing the mapping operation twice for all matching objects. – Holger Aug 19 '19 at 08:26