I have two lists of names for the same set of students which have been collected separately. There are numerous typographical errors and I have been using fuzzy matching to link the two lists. I am 99+% there with agrep
and similar, but am stuck on the following basic problem: how can I match (for example) the forenames "Adrian Bruce" and "Bruce Adrian"? The Levenshtein edit distance is no good for this particular case as it counts number of substitutions.
This must be a very common problem, but I cannot find any standard R package or routine for addressing it. I presume I am missing something obvious...???