I have two data frames with different column names, there are 10 rows each. What I'm trying to do is compare the column values and if they match copy the email address from df2 to df1. I've looked at this example but my column names are different How to join (merge) data frames (inner, outer, left, right)?. I've seen this example as well of np.where
where more than one condition is used but when i do that it gives me the following error:
ValueError: Wrong number of items passed 2, placement implies 1
What I want to do:
what I want to do is compare the first row 2 columns (first, last_huge) of df1 with all rows of df2 column (first_small, last_small) if the match is found get the email address from that particular column in df2 and assign it to a new column in df1. Can anyone please help me with this I've only copied the relevant code below and its not fully working just adding 5 new records to new_email.
Initially what i did is compared df1['first'] with df2['first']
data1 = {"first":["alice", "bob", "carol"],
"last_huge":["foo", "bar", "baz"],
"street_huge": ["Jaifo Road", "Wetib Ridge", "Ucagi View"],
"city_huge": ["Egviniw", "Manbaali", "Ismazdan"],
"age_huge": ["23", "30", "36"],
"state_huge": ["MA", "LA", "CA"],
"zip_huge": ["89899", "78788", "58999"]}
df1 = pd.DataFrame(data1)
data2 = {"first_small":["alice", "bob", "carol"],
"last_small":["foo", "bar", "baz"],
"street_small": ["Jsdffo Road", "sdf Ridge", "sdfff View"],
"city_huge": ["paris", "london", "rome"],
"age_huge": ["28", "40", "56"],
"state_huge": ["GA", "EA", "BA"],
"zip_huge": ["89859", "78728", "56999"],
"email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"],
"dob": ["31051989", "31051980", "31051981"],
"country": ["UK", "US", "IT"],
"company": ["microsoft", "apple", "google"],
"source": ["bing", "yahoo", "google"]}
df2 = pd.DataFrame(data2)
df1['new_email'] = np.where((df1[['first']] == df2[['first_small']]), df2[['email_small']], np.nan)
Now it is only adding 5 records to the new_email and rest of them are nan. and showing me this error:
ValueError: Can only compare identically-labeled Series objects