You can judge by yourself.
Your first example
WITH [
"m.Verena von Habsburg-Laufenburg",
"2m: 9.2.1354 Verena von Habsburg-Laufenburg"
] AS texts
UNWIND texts AS text
CALL apoc.text.phonetic(text) YIELD value
RETURN text, value
Results are the same :
text value
"m.Verena von Habsburg-Laufenburg" "M000V650V500H121L151"
"2m: 9.2.1354 Verena von Habsburg-Laufenburg" "M000V650V500H121L151"
Your second example
WITH [
"m.Gf Antal Pejácsevich de Verõcze (+1838)",
"2m: Budapest 5.7.1880 Gf Arthur Pejácsevich de Verõcze"
] AS texts
UNWIND texts AS text
CALL apoc.text.phonetic(text) YIELD value
RETURN text, value
Results are not the same :
text value
"m.Gf Antal Pejácsevich de Verõcze (+1838)" "M000G100A534P200C120D000V600C000"
"2m: Budapest 5.7.1880 Gf Arthur Pejácsevich de Verõcze" "M000B312G100A636P200C120D000V600C000"
Conclusion
It works for this example, but I'm not sure you can put it as a generic rule. Data lineage is complexe to achieve and you don't have any guaranty to be sure at 100%.
But definitively, apoc.text.phonetic
can helps you to achieve your goal.
Update
Your query should be like this :
MATCH (n1:Person)-[r:SAME_AS]->(n2:Person)
CALL apoc.text.phonetic(n1.name) YIELD value AS n1Phonetic
CALL apoc.text.phonetic(n2.name) YIELD value AS n2Phonetic
WHERE n1Phonetic = n2Phonetic
WITH r
SET r.samePhonetic=true
Here I set the property samePhonetic
to true
if the phonetics are the same.
Moreover, there is an other procedure called apoc.text.phoneticDelta
that can helps you to do this. With it you can defined a threshold, or directly store the delta as a property of your relationship like that :
MATCH (n1:Person)-[r:SAME_AS]->(n2:Person)
CALL apoc.text.phoneticDelta(n1.name, n2.name) YIELD delta
WITH r, delta
SET r.phoneticDelta=delta
A score of 4 means that your two strings are very similar.
A score of 0 means that your two strings are very different.