-2

I have a query to get the most similar value. Well I need to define the minimum Levenshtein distance result. If the score is more than 2, I don't want to see the value as part of the recommendation.

String recommendation =  candidates.parallelStream()
            .map(String::trim) 
            .filter(s -> !s.equals(search))
            .min((a, b) -> Integer.compare(
              cache.computeIfAbsent(a, k -> StringUtils.getLevenshteinDistance(Arrays.stream(search.split(" ")).sorted().toString(), Arrays.stream(k.split(" ")).sorted().toString()) ),
              cache.computeIfAbsent(b, k -> StringUtils.getLevenshteinDistance(Arrays.stream(search.split(" ")).sorted().toString(), Arrays.stream(k.split(" ")).sorted().toString()))))
            .get();
Cœur
  • 37,241
  • 25
  • 195
  • 267
userit1985
  • 961
  • 1
  • 13
  • 28
  • `Arrays.stream(search.split(" ")).sorted().toString()` => are you sure you want to call `Stream::toString` ? – assylias Jan 25 '16 at 13:15
  • Yes. The query works great but I need to define a maximum threshold for the levenshtein distance, if the distance is greater than 2, I don't want this value will be recommended – userit1985 Jan 25 '16 at 13:22
  • Generally, you can avoid the code duplication in a comparator by using one of the comparing… methods, e.g. [`comparingInt`](https://docs.oracle.com/javase/8/docs/api/java/util/Comparator.html#comparingInt-java.util.function.ToIntFunction-) in your case. Nevertheless, think about assylias’ question again. Invoking `toString()` on a stream is quite surely not what you want. – Holger Jan 25 '16 at 13:51
  • @Holger The query is trying to find the most similar token, including cases such as : 'Brand Compatible' will be recommended for 'Compatible Brand'. So in order to take into account those cases, I first split the String by delimiter " " , then sort the string and then run getLevenshteinDistance function. But I don't want that any result will be return. Only values with distance lower than 2. – userit1985 Jan 25 '16 at 14:03
  • @userit1985: Whatever your lengthy explanation is aiming at, it won’t change the fact that calling `toString()` on such a stream will return something like `"java.util.stream.SortedOps$OfRef@15db9742"` and calculating anything out of these strings is unlikely to lead you anywhere. – Holger Jan 25 '16 at 14:48

1 Answers1

0

You question is about one single filtering operation: how to exclude the elements with the score more 2. You need to write a predicate for it. The simplest form of a predicate that can be written without knowing any details about the rest of your application logic is the following:

.filter(s -> StringUtils.getLevenshteinDistance(search, s) <= 2)

Considering that you cache the Levenshtein scores in a HashMap, the predicate should be rewritten this way:

.filter(s -> cache.computeIfAbsent(s, k -> StringUtils.getLevenshteinDistance(search, k)) <= 2)

Now, if you want to do anything else with the elements like splitting, reordering and joining them, you can further enhance this code, but that's outside of the scope of your question.

Nevertheless, speaking of the splitting/joining, let me correct an error in your code. The line

Arrays.stream(search.split(" ")).sorted().toString()

does not really do anything useful. It would just print a hashcode of a Stream instance. I guess you wanted to get this done:

Arrays.stream(s.split(" ")).sorted().collect(Collectors.joining(" "))

This code will reorder a word chain alphabetically: "Malus Casus" -> "Casus Malus"

nolexa
  • 2,392
  • 2
  • 19
  • 19