Best practice for java stream().filter()

Question

I'm working on a java spring application and want to filter and get specific item in mongodb database, but it is quite messy my code and would like to know how to improve my code when working with stream filter

public String getPrimaryTechnology(String id){
    Article article = articleRepository.findById(id).get();

    List<AWS_entity> aws_entitiesList =  article.getAws().getEntities();

    Predicate<AWS_entity> priority1 = aws -> article.getTitle().contains(aws.getName()) 
            && aws.getCategory().equals("TITLE");

    Predicate<AWS_entity> priority2 = aws -> article.getTitle().contains(aws.getName()) 
            && aws.getCategory().equals("COMMERCIAL_ITEM"));

    Predicate<AWS_entity> priority3 = aws -> article.getTitle().contains(aws.getName())
            && (aws.getCategory().equals("ORGANIZATION") || aws.getCategory().equals("OTHER"));


    Optional<AWS_entity> aws_entity = Optional.ofNullable(aws_entitiesList
            .stream()
            .sorted(Comparator.comparing(AWS_entity::getCount).reversed())
            .filter(priority1)
            .findFirst()
            .orElse(aws_entitiesList.stream()
                    .filter(priority2)
                    .findFirst()
                    .orElse(aws_entitiesList.stream()
                            .filter(priority3)
                            .findFirst().orElse(null)
                    )
            ))
    ;

    if(!aws_entity.isPresent())
        return "none identified";
    else return aws_entity.get().getName();
}

@Mhanxsolo if you want to improve speed, I'd propose to improve the streaming solutions from the answers. instead of sort->filter->findfirst use filter->max (or min), and instead of filter->findfirst use findAny. — Curiosa Globunznik, Dec 08 '19 at 05:27
loops aren't that much faster, except they operate on arrays with primitive types, not objects. you can find some loop vs stream comparisons. — Curiosa Globunznik, Dec 08 '19 at 05:43
Just a quick heads-up: when you use `.orElse` its argument will _always_ be evaluated, regardless of whether or not it is actually required. Considering you are going through (potentially) large collections, this can be quite expensive (read: slow). Use `.orElseGet` in such a case: the expensive operation will then only be performed when required. `.orElse` should be reserved for cases when you return a constant (i.e. `Collection.emptyList`, `""`, `0`). — knittl, Dec 08 '19 at 08:24
There's a stream refactoring/optimization [plugin for eclipse](https://marketplace.eclipse.org/content/optimize-java-8-streams-refactoring), here's the related [research paper](https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1475&context=hc_pubs) — Curiosa Globunznik, Dec 08 '19 at 16:42
And [here](https://stackoverflow.com/questions/24054773/java-8-streams-multiple-filters-vs-complex-condition) somebody measured loops vs streams — Curiosa Globunznik, Dec 08 '19 at 16:51

ernest_k · Accepted Answer · 2019-12-08T05:46:24.283

3

Two possible ways of simplifying this (you've already done much to make the code more readable)

First, you can use a higher-order function to compute your predicates. That way, you don't need to declare three.
Second, you can use the same chain with Optional.or.

Here's what it can look like:

Article article = articleRepository.findById(id).get();
List<AWS_entity> aws_entitiesList = article.getAws().getEntities();

//a function that returns your predicates
Function<List<String>, Predicate<AWS_entity>> predictateFunction = 
        list -> aws -> article.getTitle().contains(aws.getName())
                        && list.stream()
                        .anyMatch(category -> aws.getCategory.equals(category));

return aws_entitiesList.stream()
        .filter(predictateFunction.apply(Arrays.asList("TITLE")))
        .max(Comparator.comparing(AWS_entity::getCount))
        .or(() -> aws_entitiesList.stream()
                    .filter(predictateFunction.apply(
                             Arrays.asList("COMMERCIAL_ITEM")))
                    .findFirst()
        )
        .or(() -> aws_entitiesList.stream()
                    .filter(predictateFunction.apply(
                             Arrays.asList("ORGANIZATION", "OTHER")))
                    .findFirst()
        )
        .map(entity -> entity.getName())
        .orElse("none identified");

edited Dec 08 '19 at 05:46

answered Dec 08 '19 at 05:23

ernest_k

44,416
5
53
99

you could use filter->max (or min, depends) instead of sort->filter->findfirst and findAny instead of filter->findfirst. – Curiosa Globunznik Dec 08 '19 at 05:31
@gurioso thanks - would that be significantly better than, say *filter -> sort -> findFirst*? – ernest_k Dec 08 '19 at 05:35
1

sort is certainly more expensive than picking a max/min. streaming can optimize a bit, but not that, I'd guess. I'd do an experiment. could be it guesses the optimization for filter->findfirst. – Curiosa Globunznik Dec 08 '19 at 05:36
thank you ernest_k and gurioso, I think this way the code is more readable instead of using orElse and to use max/min would be a better option than sorted->filter->findFirst – Mhanxsolo Dec 08 '19 at 05:56
1

One can simplify the `Function` to map to a String input as `Function> predicateFunction = str -> aws -> article.getTitle().contains(aws.getName()) && aws.getCategory().equals(str);` and further the `anyMatch` could be replaced with `Predicate.or` such as `predicateFunction.apply("ORGANIZATION").or(predicateFunction.apply("OTHER"))` – Naman Dec 08 '19 at 07:40
1

@ernest_k: Depending on the actual structure of the entities, this might result in a different result. The version from this answer will return `"none identified"` if either the entity was not found, or the name of the entity was `null`. The original version in the question only return `"none identified"` for missing entities. If an entity was found, but didn't have a name, then `null` was returned. – knittl Dec 08 '19 at 08:27
Right, @knittl. Mhanxsolo, if your `getName` method may return null and you'd rather have those null values returned instead of the default value, let me know and I'll edit the answer. – ernest_k Dec 08 '19 at 08:38
there's a [bit of indication](https://stackoverflow.com/questions/48509561/does-java-compiler-optimize-stream-filtering) that also filter->findfirst won't get translated to findAny. But why write more code than necessary anyway that may be slower even? There's a stream refactoring/optimization [plugin for eclipse](https://marketplace.eclipse.org/content/optimize-java-8-streams-refactoring), here's the related [research paper](https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1475&context=hc_pubs) – Curiosa Globunznik Dec 08 '19 at 16:46

score 0 · Answer 2 · answered Dec 08 '19 at 08:17

To add another approach, one could use a Function<String, Stream<AWS_entity>> such as:

Function<String, Stream<AWS_entity>> priorityStream = str ->
        aws_entitiesList.stream()
                .filter(aws -> article.getTitle().contains(aws.getName())
                        && aws.getCategory().equals(str));

and then make use of it as

Stream<AWS_entity> firstPref = priorityStream.apply("TITLE")
        .sorted(Comparator.comparing(AWS_entity::getCount).reversed());
Stream<AWS_entity> secondPref = priorityStream.apply("COMMERCIAL_ITEM");
Stream<AWS_entity> thirdPref = Stream.concat(priorityStream.apply("ORGANIZATION"),
        priorityStream.apply("OTHER"));

return firstPref.findFirst()
        .or(secondPref::findFirst)
        .or(thirdPref::findFirst)
        .map(AWS_entity::getName)
        .orElse("none identified");

Best practice for java stream().filter()

2 Answers2