I am trying to match two datasets in R: datasetA and datasetB. These datasets contain the following columns.
datasetA
- ID: 15
- Name: peter sanders
- First_Name: peter
- Last_Name: sanders
- ORG_NAME:coffee&cake
- City: New York
- Amount(USD): 10369
- Category: food & beverages
- Date: 12/01/2014
datasetB has similar columns:
- ORG_ID:5241
- names: peter sander
- first name: peter
- last name: sander
- company_name: coffee and cakes
- location: New York
- funded: 10000
- sub_cat: restaurants
- start_date: 2013-01-09 16:42:56
- end_date: 2015-01-04 11:43:39
The only exact match there is is the first name 'peter'. But my datasets contain many companies so there will be many 'peter''s in my dataset that are not the same person. Therefore, I want to match on similarity in multiple columns.
I want to match these two datasets based on the information in all columns. I think I need Levenshtein Similarity and compare.linkage for this, however I did not succeed.
Does anyone know how I can match this? Any help would be greatly appreciated.