I have two different dataframes in R that I am trying to merge together. One is just a set of names and the other is a set of names with corresponding information about each person.
So say I want to take this first dataframe:
Name
1. Blow, Joe
2. Smith, John
3. Jones, Tom
etc....
and merge it to this one:
DonorName CandidateName DonationAmount CandidateParty
1 blow joe Bush, George W 3,000 Republican
2 guy some Obama, Barack 5,000 Democrat
3 smith john Reid, Harry 4,000 Democrat
such that I'd have a new list that includes only people on my first list with the information from the second. Were the two "Name" values formatted in the same way, I could just use merge(), but would there be a way to somehow use agrep() or pmatch() to do this?
Also, the 2nd dataframe I'm working with has about 25 million rows in it and 6 columns, so would making a for loop be the fastest way to go about this?
Reproducible versions of the example data:
first <- data.frame(Name=c("Blow, Joe","Smith, John","Jones, Tom"),
stringsAsFactors=FALSE)
second <- read.csv(text="
DonorName|CandidateName|DonationAmount|CandidateParty
blow joe|Bush, George W|3,000|Republican
guy some|Obama, Barack|5,000|Democrat
smith john|Reid, Harry|4,000|Democrat",header=TRUE,sep="|",
stringsAsFactors=FALSE)