0

I have the following Cypher query:

MATCH (n)-[r]->(k) 
WHERE ANY(x in keys(n) 
    WHERE round(apoc.text.levenshteinSimilarity(
       TRIM(
          REDUCE(mergedString = "", item in n[x] 
               | mergedString + item + " ")), "syn"), 4) 
                   > 0.8) 
RETURN n, r, k

How can I return the score generated inside the WHERE clause by the similarity function.

I am trying to do this with WITH, without luck:

MATCH (n)-[r]->(k) 
WITH *,  [x in keys(n) | [x, round(apoc.text. levenshteinSimilarity(TRIM(REDUCE(mergedString = '', item in n[x] | mergedString + item + ' ')), 'syn'), 4)]] as scores
WHERE [s in scores WHERE s[1] >= 0.8]
RETURN n,r,k,[s in scores WHERE s[1] >= 0.8] AS attr_scores
SteveS
  • 3,789
  • 5
  • 30
  • 64
  • Can you please help? @jose_bacoy – SteveS Nov 20 '22 at 15:10
  • Can you show us sample input, along with the expected output. – Charchit Kapoor Nov 20 '22 at 16:24
  • I am looking for a concept, Jose already helped me with the query but I need the scores also, I have updated the answer with latest solution, I am looking to make it efficient. https://stackoverflow.com/questions/74479572/cant-apply-fuzzy-distance-function-in-a-cypher-query-that-checks-similarity-aga – SteveS Nov 20 '22 at 16:26
  • @CharchitKapoor look at my edited answer and latest comment, I think it's already solved but it seems that I don't need to run ```[s in scores WHERE s[1] >= 0.8]``` twice, is there any way to filter all the results above 0.8 threshold and return only relevant attributes and their scores? – SteveS Nov 20 '22 at 16:32
  • `[s in scores WHERE s[1] >= 0.8]` is present in two places, in the WHERE clause and in the return statement. Which one do you think is redundant? I think you can remove the WHERE clause. – Charchit Kapoor Nov 20 '22 at 16:37

1 Answers1

1

To return only relevant attributes with a score > 0.8, update your list comprehension to this:

MATCH (n)-[r]->(k) 
WITH *,  [x in keys(n) | [x, round(apoc.text. levenshteinSimilarity(TRIM(REDUCE(mergedString = '', item in n[x] | mergedString + item + ' ')), 'syn'), 4)]] as scores
RETURN n,r,k,[s in scores WHERE s[1] >= 0.8 | s] AS attr_scores

Finally together with Charchit Kapoor we've found out the best solution:

MATCH (n)-[r]->(k)  
UNWIND keys(n) as key  
WITH n, r, k, key, round(apoc.text. levenshteinSimilarity(TRIM(REDUCE(mergedString = "", item in n[key] | mergedString + item + " ")), "syn"), 4) as score  
WITH n, r, k, collect({key:key, value:n[key], score:score}) as keyScores  
WITH n, r, k, [s in keyScores 
WHERE s.score >= 0.8 | s] AS attr_scores WHERE size(attr_scores) > 0 
RETURN *
Charchit Kapoor
  • 8,934
  • 2
  • 8
  • 24
  • How about the following solution but I would like to avoid redundancy and filter once the relevant attributes above some threshold: ```MATCH (n)-[r]->(k) UNWIND keys(n) as key WITH n, r, k, key, round(apoc.text. levenshteinSimilarity(TRIM(REDUCE(mergedString = "", item in n[key] | mergedString + item + " ")), "syn"), 4) as score WITH n, r, k, collect({key:key, value:n[key], score:score}) as keyScores WHERE ANY(s in keyScores WHERE s.score >= 0.8) RETURN n, r, k, [s in keyScores WHERE s.score >= 0.8 | s] AS attr_scores``` – SteveS Nov 21 '22 at 10:20
  • Will be happy to hear your opinion on the above query @charchit-kapoor – SteveS Nov 21 '22 at 10:23
  • 1
    It's good. This can be even better. `MATCH (n)-[r]->(k) UNWIND keys(n) as key WITH n, r, k, key, round(apoc.text. levenshteinSimilarity(TRIM(REDUCE(mergedString = "", item in n[key] | mergedString + item + " ")), "syn"), 4) as score WITH n, r, k, collect({key:key, value:n[key], score:score}) as keyScores WITH n, r, k, [s in keyScores WHERE s.score >= 0.8 | s] AS attr_scores WHERE size(attr_scores) > 0 RETURN *` @SteveS – Charchit Kapoor Nov 21 '22 at 10:50
  • So the second with will filter out on a list of items on their attributes? So every time I have a list of items like here (scores) and each item has fields, I need to run WITH? @charchit-kapoor – SteveS Nov 21 '22 at 11:54
  • 1
    First `WITH` calculates the scores, second `WITH`, collects them in a list. Third `WITH` filters out the list, to keep only elements with >= 0.8. And also it filters the records where no element with score >= 0.8 is present. In this way, we do the filtering of score only once, as compared to the above version, where we were first filtering in `ANY` function, and then in the next `WITH` clause. @steveS – Charchit Kapoor Nov 21 '22 at 14:09
  • Thanks a lot dear @Charchit-kapoor, great explanation. By the way can you share a good link to a "bible" of cypher? – SteveS Nov 21 '22 at 20:29