0

So the following command gives me a match:
agrep(pattern = 'Ned Stark', x = 'Ned Stark**DUPL ENTRY', max.distance = 0.15, costs = NULL, ignore.case = TRUE, value = FALSE, fixed = TRUE, useBytes = FALSE)
But when I flip the two strings, then I no longer get a match
agrep(pattern = 'Ned Stark**DUPL ENTRY', x = 'Ned Stark', max.distance = 0.15, costs = NULL, ignore.case = TRUE, value = FALSE, fixed = TRUE, useBytes = FALSE)

Scratch
  • 57
  • 1
  • 1
  • 6
  • The pattern is searched for *within* the string. In the first instance there is match within the set distance parameter. In the second, there isn't. – Ritchie Sacramento Aug 04 '20 at 14:50
  • Is there a way to make sure it matches in both cases without changing max.distance? – Scratch Aug 04 '20 at 15:22
  • No, the only way to get a match in the second case is to increase the max distance. Perhaps if you step back and explain what you're trying to achieve there might be an alternative approach. – Ritchie Sacramento Aug 04 '20 at 15:31
  • I need to detect duplicate (not exact) entries in the table. So in this case 'Ned Stark' is the same as 'Ned Stark**DUPL ENTRY', but whichever I use as 'source' and other as 'target', I need to get consistent matching results based on distance parameter in both cases. Do you think other algorithm or function is more appropriate then? – Scratch Aug 04 '20 at 15:46

0 Answers0