Hello I'm needing some advice on string matching.
I have a large dataset of investors and various deals they have been involved with.
Some of the investors and their deal data are not relevant to my investigation and I have a master list of approved investors who's data is relevant.
I believe I can fuzzy string match this master list of approved investors against my full dataset to identify the rows which contain relevant information.
Is it possible to use fuzzy matching to match the lists and then generate a value of 0 or 1 in a column to distinguish approved vs not approved?
I'm using R and the fuzzywuzzyR package, open to other suggestions too.
A fictional example of the data is as below, in reality there are 23 columns of variables. The master list of approved investors is simply a long list of names.
Investor | Deal Size | Deal Date |
---|---|---|
3i Group | 4.5 | 20/03/19 |
123 IM | 2.3 | 12/04/18 |
Ørsted | 6.7 | 25/08/17 |
KKR | 7.4 | 23/09/17 |