How to write cypher query to count number of nodes in graph based on levenshtein similarity

Question

Hello everyone I need to write a cypher query for a below scenario.

Given a list of strings, count the number nodes in graph where levenshtein similarity between node name property and strings from the list is more than certain thershold.

I was able to write query if we only have 1 string but I am not sure how to write a query if we have multiple strings ['string 1', 'string 2', 'string 3'].

MATCH (n:Node)
UNWIND (n.name) as name_lst
RETURN SUM(toInteger(apoc.text.levenshteinSimilarity(name_lst, 'string 1') > 0.6))

Any thoughts on how to transform the above a query if we have multiple strings.

you want to check if any of the strings in the list has a levSim > 0.6 to name? Is it correct? — jose_bacoy, Feb 03 '23 at 13:19

score 1 · Answer 1 · answered Feb 03 '23 at 11:20

One option is to use reduce:

MATCH (n:Node)
WITH toInteger(reduce(maxValSoFar = 0, 
  s IN ['string 1', 'string 2', 'string 3'] | 
  apoc.coll.max([maxValSoFar, apoc.text.levenshteinSimilarity(n.name, s)])) > 
  0.6) AS nodes
RETURN SUM(nodes)

For sample data:

MERGE (a1:Node {name:'string 1'})    
MERGE (a2:Node {name:'asdss'})   
MERGE (a3:Node {name:'string 2'})
MERGE (a4:Node {name:'afffs'})
MERGE (a5:Node {name:'efwetreyy'})
MERGE (a6:Node {name:'ffuumxt'})

The result is:

╒════════════╕
│"sum(nodes)"│
╞════════════╡
│2           │
└────────────┘

score 1 · Accepted Answer · answered Feb 03 '23 at 13:10

No need to UNWIND the name as name_lst and you can use that variable directly in the APOC function.

If any of the string in the list ['string 1', 'string 2', 'string 3'] has a levSim value of > 0.6 then it will return true. Converting true to integer is 1.

Thus, getting the sum of all 1s in the result will give you the number of Nodes that has a name property with levSim value > 0.6 to any string on the list ['string 1', 'string 2', 'string 3'].

MATCH (n:Node)
RETURN SUM(toInteger(ANY(s in ['string 1', 'string 2', 'string 3'] 
      WHERE apoc.text.levenshteinSimilarity(n.name, s ) > 0.6)))

How to write cypher query to count number of nodes in graph based on levenshtein similarity

2 Answers2