So the following command gives me a match:
agrep(pattern = 'Ned Stark', x = 'Ned Stark**DUPL ENTRY', max.distance = 0.15, costs = NULL, ignore.case = TRUE, value = FALSE, fixed = TRUE, useBytes = FALSE)
But when I flip the two strings, then I no longer get a match
agrep(pattern = 'Ned Stark**DUPL ENTRY', x = 'Ned Stark', max.distance = 0.15, costs = NULL, ignore.case = TRUE, value = FALSE, fixed = TRUE, useBytes = FALSE)
Asked
Active
Viewed 26 times
0

Scratch
- 57
- 1
- 1
- 6
-
The pattern is searched for *within* the string. In the first instance there is match within the set distance parameter. In the second, there isn't. – Ritchie Sacramento Aug 04 '20 at 14:50
-
Is there a way to make sure it matches in both cases without changing max.distance? – Scratch Aug 04 '20 at 15:22
-
No, the only way to get a match in the second case is to increase the max distance. Perhaps if you step back and explain what you're trying to achieve there might be an alternative approach. – Ritchie Sacramento Aug 04 '20 at 15:31
-
I need to detect duplicate (not exact) entries in the table. So in this case 'Ned Stark' is the same as 'Ned Stark**DUPL ENTRY', but whichever I use as 'source' and other as 'target', I need to get consistent matching results based on distance parameter in both cases. Do you think other algorithm or function is more appropriate then? – Scratch Aug 04 '20 at 15:46