1

Hello everyone I need to write a cypher query for a below scenario.

Given a list of strings, count the number nodes in graph where levenshtein similarity between node name property and strings from the list is more than certain thershold.

I was able to write query if we only have 1 string but I am not sure how to write a query if we have multiple strings ['string 1', 'string 2', 'string 3'].

MATCH (n:Node)
UNWIND (n.name) as name_lst
RETURN SUM(toInteger(apoc.text.levenshteinSimilarity(name_lst, 'string 1') > 0.6))

Any thoughts on how to transform the above a query if we have multiple strings.

Atinesh
  • 1,790
  • 9
  • 36
  • 57

2 Answers2

1

One option is to use reduce:

MATCH (n:Node)
WITH toInteger(reduce(maxValSoFar = 0, 
  s IN ['string 1', 'string 2', 'string 3'] | 
  apoc.coll.max([maxValSoFar, apoc.text.levenshteinSimilarity(n.name, s)])) > 
  0.6) AS nodes
RETURN SUM(nodes)

For sample data:

MERGE (a1:Node {name:'string 1'})    
MERGE (a2:Node {name:'asdss'})   
MERGE (a3:Node {name:'string 2'})
MERGE (a4:Node {name:'afffs'})
MERGE (a5:Node {name:'efwetreyy'})
MERGE (a6:Node {name:'ffuumxt'})

The result is:

╒════════════╕
│"sum(nodes)"│
╞════════════╡
│2           │
└────────────┘
nimrod serok
  • 14,151
  • 2
  • 11
  • 33
1

No need to UNWIND the name as name_lst and you can use that variable directly in the APOC function.

If any of the string in the list ['string 1', 'string 2', 'string 3'] has a levSim value of > 0.6 then it will return true. Converting true to integer is 1.

Thus, getting the sum of all 1s in the result will give you the number of Nodes that has a name property with levSim value > 0.6 to any string on the list ['string 1', 'string 2', 'string 3'].

MATCH (n:Node)
RETURN SUM(toInteger(ANY(s in ['string 1', 'string 2', 'string 3'] 
      WHERE apoc.text.levenshteinSimilarity(n.name, s ) > 0.6)))
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38