Is there an algorith / package to automaticly fix mistakes in name lists?

Asked May 02 '19 at 12:00

Active May 02 '19 at 12:00

Viewed 27 times

I have a long list of names in a spreadsheet that I am using in R. There is a few classical issues with names (corporate or persons) such as the example below.

DU PONT JEAN
DUPONT JEAN
DUPON T JEAN
DUPONT JEAN
DUPONT J
DU-PONT JEAN
DU POTN JEAN

I am trying to fix a few things such as spaces between names or taking only the first letter of first name but it is not very satisfying.

As it is very common issues, I wonder if there is a piece of code or package to deal with this?

asked May 02 '19 at 12:00

Plantekös

1

Possible duplicate of [Efficient string similarity grouping](https://stackoverflow.com/questions/48058104/efficient-string-similarity-grouping) – iod May 02 '19 at 12:11
Consider using `agrep`, which tells you in a string is similar to another. There are more complex implementations of this in the package `stringdist`. – iod May 02 '19 at 12:13

Is there an algorith / package to automaticly fix mistakes in name lists?

0 Answers0