0

Within a Graph of Persons some of the nodes are connected with a SAME_AS relationship.

(p1:{name:'m.Verena von Habsburg-Laufenburg'})-[SAME_AS]-(p1:{name:'2m: 9.2.1354 Verena von Habsburg-Laufenburg'})

In the first example these persons are really the same but we have other example as:

(p1:{name:'m.Gf Antal Pejácsevich de Verõcze (+1838)'})-[SAME_AS]-(p2: {name:'2m: Budapest 5.7.1880 Gf Arthur Pejácsevich de Verõcze'})

Is there a chance to find a decision with apoc.text.phonetic ?

techie95
  • 515
  • 3
  • 16
Andreas Kuczera
  • 353
  • 1
  • 2
  • 14

1 Answers1

0

You can judge by yourself.

Your first example

WITH [
    "m.Verena von Habsburg-Laufenburg",
    "2m: 9.2.1354 Verena von Habsburg-Laufenburg"
] AS texts
UNWIND texts AS text
CALL apoc.text.phonetic(text) YIELD value
RETURN text, value

Results are the same :

text                                            value
"m.Verena von Habsburg-Laufenburg"              "M000V650V500H121L151"
"2m: 9.2.1354 Verena von Habsburg-Laufenburg"   "M000V650V500H121L151"

Your second example

WITH [
    "m.Gf Antal Pejácsevich de Verõcze (+1838)",
    "2m: Budapest 5.7.1880 Gf Arthur Pejácsevich de Verõcze"
] AS texts
UNWIND texts AS text
CALL apoc.text.phonetic(text) YIELD value
RETURN text, value

Results are not the same :

text                                                        value
"m.Gf Antal Pejácsevich de Verõcze (+1838)"                 "M000G100A534P200C120D000V600C000"
"2m: Budapest 5.7.1880 Gf Arthur Pejácsevich de Verõcze"    "M000B312G100A636P200C120D000V600C000"

Conclusion

It works for this example, but I'm not sure you can put it as a generic rule. Data lineage is complexe to achieve and you don't have any guaranty to be sure at 100%. But definitively, apoc.text.phonetic can helps you to achieve your goal.

Update

Your query should be like this :

MATCH (n1:Person)-[r:SAME_AS]->(n2:Person)
CALL apoc.text.phonetic(n1.name) YIELD value AS n1Phonetic
CALL apoc.text.phonetic(n2.name) YIELD value AS n2Phonetic
WHERE n1Phonetic = n2Phonetic
WITH r
    SET r.samePhonetic=true

Here I set the property samePhonetic to true if the phonetics are the same.

Moreover, there is an other procedure called apoc.text.phoneticDelta that can helps you to do this. With it you can defined a threshold, or directly store the delta as a property of your relationship like that :

MATCH (n1:Person)-[r:SAME_AS]->(n2:Person)
CALL apoc.text.phoneticDelta(n1.name, n2.name) YIELD delta
WITH r, delta
    SET r.phoneticDelta=delta

A score of 4 means that your two strings are very similar. A score of 0 means that your two strings are very different.

logisima
  • 7,340
  • 1
  • 18
  • 31
  • Thanks. I used your query and built a generic one for my database: `MATCH (n1:Person)-[:SAME_AS]-(n2:Person) WITH COLLECT([n1.name, n2.name]) AS texts UNWIND texts AS text CALL apoc.text.phonetic(text) YIELD value RETURN text, value` Is there a way to mark the SAME_AS-edge it values are equal ? – Andreas Kuczera Feb 13 '18 at 12:19