Goal: Merge two excel files that will have significant overlap, but overwrite ONLY the phone numbers and record ID of one data set.
What I have been doing: Just brute force de-dup in excel where I copy over the sheet with phone numbers, sort the ID column, identify/highlight duplicates, and drag "up" the phone numbers to fill in the empty space for the matching ID. The process isn't hard, but with more records it starts to get absurdly tedious. These data look like this in the merged but not de-duplicated Excel file. Or just in plain text:
555555 | Joe | Copy | DOB |AGE | 555 Data Road | DataVille | LA |ZIP|County|(**PHONE GOES HERE**) |Male|White|Doc Name|More info
555555| Joe| Copy |DOB| AGE| 555 Data Road|DataVille|LA| ZIP|County| 555555555 (Phone)
And the phone should be added to the space between County and Gender for every record that matches the two ID's (first number in the record).
Attempts in R:
df_final <- merge(df_noPhone, df_Phone, by = c("Record_ID"), all.x = T)
But this just duplicates the columns ("PatientAddress.x" etc.) and I need those to be synced up for the records to be complete.
The real tricky part is, though that it isn't consistent this way throughout the data. Sometimes we simply don't have phone numbers for certain records and we still want to retain them in the data.
Suggestions? I've tried merging with almost every package I can imagine but sometimes it ends up creating more work in the direct, raw data file afterwards than it's worth.
Thanks!