1

I have a Collection<Product> and a Collection<Predicate<Product>>. The Predicates are a simple combination of boolean flags. For example:

             | isNew | isSoldOut
--------------------------------
predicate 1: | false | false
predicate 2: | false | true
predicate 3: | true  | false
predicate 4: | true  | true

I want to find "one of every kind", though I'd like to find the first match for every Predicate within the products Collection.

Currently my code looks like this:

List<Product> products = getProducts();
List<Predicate<Product>> predicates = getPredicates();

List<Product> result = predicates.stream()
  .flatMap(predicate -> products.stream().filter(predicate).findFirst().stream())
  .collect(Collectors.toList());

But this of course will iterate of the products Collection multiple times, which is not desired, because in my case I have 100000 Products and 64 Predicates and it takes a long time.

In my special case the Predicates are mutually exclusive: If a Predicate returns true, then all other Predicates can be skipped for that particular Product. And because I use findFirst this Predicate then can be skipped for all other Products.

I was wondering if it is possible to iterate over the Product Collection instead, and check every Product only once against all the Predicates.

Benjamin M
  • 23,599
  • 32
  • 121
  • 201
  • Perhaps use something other than a stream? When distinct elements influence the processing of others in the same pipeline (which is necessary in this case), streams usually become less usable. – ernest_k Feb 07 '22 at 12:14
  • How about a map of combinations -> product fullfilling them? – daniu Feb 07 '22 at 12:14
  • 2
    @ernest_k I thought about using a simple nested for-loop, which iterates over `products` on the outside, and `predicates` on the inside. This loop then could remove items from the `predicates` list, when something matches and call `continue` on the outer loop. `...` but I was hoping that there's maybe some clever solution built into the `Stream` framework – Benjamin M Feb 07 '22 at 12:22

3 Answers3

2

How about doing it the other way around? Streaming the products, and applying predicates on them.

List<Predicate<Product>> predicates = getPredicates();
List<Product> products = getProducts();
List<Product> filtered = products.stream().filter(product -> {
    Iterator<Predicate<Product>> iterator = predicates.iterator();
    while (iterator.hasNext()) {
        Predicate<Product> currentPredicate = iterator.next();
        if (currentPredicate.test(product)) {
             iterator.remove();
             return true;
        }
    }
    return false;
}).collect(Collectors.toList());

The downside is you have to be careful which collection you use for predicates, Iterator.remove is not always supported.

Edit: Looks like i wasn't reading carefully enough. I think getting one of each would be best with loop.

List<Product> products = getProducts();
List<Predicate<Product>> predicates = getPredicates();
List<Product> matchingProducts = new ArrayList<>(predicates.size());
for (Product product : products) {
    if (predicates.isEmpty()) {
        break;
    }
    for (int predicateIndex = 0; predicateIndex < predicates.size(); predicateIndex++) {
        Predicate<Product> predicate = predicates.get(predicateIndex);
        if (predicate.test(product)) {
            matchingProducts.add(product);
            predicates.remove(predicateIndex);
            break;
        }
    }
}

Actually managed to achieve it with a stream and takeWhile, you were correct, Benjamin.

List<Predicate<Product>> predicates = getPredicates();
List<Product> products = getProducts();
List<Product> matches = products.stream()
        .takeWhile(product -> !predicates.isEmpty())
        .filter(product -> {
            Iterator<Predicate<Product>> iterator = predicates.iterator();
            while (iterator.hasNext()) {
                if (iterator.next().test(product)) {
                    iterator.remove();
                    return true;
                }
            }
            return false;
        })
        .collect(Collectors.toList());

Just make sure takeWhile is before filter, otherwise last matching element gets skipped.

Chaosfire
  • 4,818
  • 4
  • 8
  • 23
  • The second solution seems correct, the first one may find multiple products for the same predicate though. In terms of performance it is likely that looping over the predicates is more efficient than `or()`ing them all (as it creates a big nested predicates lambda structure), but of course that should be properly benchmarked if it could be an issue. – Didier L Feb 07 '22 at 13:01
  • I think you, too, are also missing the OP's point that they want "one of each". That is, one product that matches each of the predicates. The result needs to contain four products in their example. – RealSkeptic Feb 07 '22 at 13:17
  • As said by the other comments: I want **one** Product for every Predicate (4 Predicates = (up to) 4 Products). Your second code snippet looks fine, but could be improved: If `predicates` is empty, the stream can be stopped. Can this be solved using `takeWhile`? – Benjamin M Feb 07 '22 at 13:44
  • @RealSkeptic you are right, i guess i need to read more carefully. – Chaosfire Feb 07 '22 at 14:01
  • @BenjaminM I suppose `takeWhile` will work, check the edit. But because i am not sure it will, i posted option with loops. – Chaosfire Feb 07 '22 at 14:05
  • @BenjaminM `takeWhile` works as you suggested, it's just that we need external counter, check last edit. Using `predicates.isEmpty()` actually skips last matching element. – Chaosfire Feb 07 '22 at 15:07
  • @Chaosfire I think you could check for `predicates.isEmpty()` instead of using a separate `int` to count? The Predicates List should be empty, when you removed all of them using the iterator. – Benjamin M Feb 07 '22 at 15:49
  • @BenjaminM Actually we could check `predicates.isEmpty()`, you are right. We just have to use `takeWhile` before `filter`, it took me a while to realize this ordering affects end result. Anyway edited again, it's much cleaner like this. – Chaosfire Feb 07 '22 at 16:13
  • Instead of `takeWhile(...)` before, `limit(predicates.size())` could be used after the filter to avoid querying the predicates collection every iteration. – Andreas Berheim Brudin Feb 07 '22 at 21:09
1

Your current solution will iterate over the collection many times, but as findFirst is a short-circuiting operator, it will stop as soon as it finds a match. Have you bench-marked it to make sure it is not good enough?

An alternative is to use a stateful filter (see the top answer to this post):

public static Predicate<Product> matchAndDiscard(final List<Predicate<Product>> predicates) {
  final Set<Predicate<Product>> remaining = new HashSet<>(predicates);
  return product -> {
    final var match = remaining.stream().filter(pred -> pred.test(product)).findFirst();
    match.ifPresent(remaining::remove);
    return match.isPresent();
  };
}

much like @Chaosfire's approach, but with the state contained within the filter function. If you believe that all predicates will be matched by at least one product, you can also save some time by limiting the stream to the number of predicates, like so:

final var predicates = getPredicates()
final var result = getProducts().stream()
    .filter(matchAndDiscard(predicates))
    .limit(predicates.size())
    .toList()

In your current solution, you would traverse the products "horisontally":

       --> products
pred1: ffffffffffffft
pred2: fffft
pred3: ffffffffffffffft
pred4: ft
etc.

The alternative will instead do a "vertical" traversal:

           products
pred1: | ffffffffffffft
pred2: | fffft
pred3: v ffff fffffffffft
pred4:   ft

so it is not obvious that one would be much faster than the other, it depends on the particular configuration.

Andreas Berheim Brudin
  • 2,214
  • 1
  • 18
  • 13
  • The stateful filter is pretty neat! In my use case I can't make sure that every Predicate has at least one matching product, but this still may help other people having a similar issue. – Benjamin M Feb 07 '22 at 20:40
  • 1
    @BenjaminM The limit doesn't hurt anyway, because if not all predicates find a matching product, it will simply have no effect - so it should work for your use case as well, it is just an optimization *if* all the predicates matches. – Andreas Berheim Brudin Feb 07 '22 at 20:45
0

If I understand correctly, you're looking for something like:

List<Product> results = products.stream()
                        .filter(prod -> predicates.stream()
                                        .anyMatch(pred -> pred.test(prod)))
                        .collect(Collectors.toList());
Shawn
  • 47,241
  • 3
  • 26
  • 60