0

I have a long list of names in a spreadsheet that I am using in R. There is a few classical issues with names (corporate or persons) such as the example below.

DU PONT JEAN
DUPONT JEAN
DUPON T JEAN
DUPONT JEAN
DUPONT J
DU-PONT JEAN
DU POTN JEAN

I am trying to fix a few things such as spaces between names or taking only the first letter of first name but it is not very satisfying.

As it is very common issues, I wonder if there is a piece of code or package to deal with this?

Plantekös
  • 33
  • 6
  • 1
    Possible duplicate of [Efficient string similarity grouping](https://stackoverflow.com/questions/48058104/efficient-string-similarity-grouping) – iod May 02 '19 at 12:11
  • Consider using `agrep`, which tells you in a string is similar to another. There are more complex implementations of this in the package `stringdist`. – iod May 02 '19 at 12:13

0 Answers0