0

I have two data frames, df1 (8852 X 16) and df2 (302 X 4). In df2 the columns are poc_fristname, poc_lastname, phone, and email. in df1 it has all of these columns plus many other columns to it.

I am trying to match multiple values from df2 to df1 and grab the third column in df1 so I can do some insights. I have been attempting isin() and also have been attempting .merge but only seem to grab a couple of values or none that match. I believe this is due to different names being used, different phone numbers being used and different emails.

So in order to catch the third value column, I am looking at poc_lastname, email, and phone unique identifiers.

example dataset:

df1

affiliate_code  poc_firstname   poc_lastname    business_name               phone       email   
AFF0007070      Jamel           Zright          Jamel Zright                5555555555  jamel@zr.co 
BAM0000001      Brandon         Elison          Michael Brandon Elison      6666666666  bellison@sygrp.com
BGA2631157      Tarry           Zimmerman       Tarry Jon Zimmerman         7777777777  barryz@ba.com
CGD18880697     Crena           Heman           Crena Heman                      NaN     NaN    
CGD8401802      Stevie          Genciik         Stevie Genciik              8888888888  info@edirect.com    

df2

    poc_firstname   poc_lastname    phone           email
0   Aron            Lai             6.789793e+09    aron.lai1@icloud.com
1   Aron            Sweat           4.695152e+09    aronts@gmail.com
2   Aaron           Harwod          5.072724e+09    aaron@arh.com
3   Abby            Bedient         3.055190e+09    abby@i.com
4   Adam            Howard          2.484597e+09    ahoward@gmail.com

expected output:

    affiliate_code  poc_lastname    phone              email
0   BGA2631157      Sweat           4.695152e+09       aaron.lane1@icloud.com
1   CGD8401802      Bedient         3.055190e+09       aarontsweet@gmail.com

Code tried:

df1.merge(df2, left_on='email', right_on='email')[['affiliate_code']]

df3 = df2[df2['phone'].isin(df1['phone'])]

Each attempt gets some of the matching values or none at all.

If any clarification is needed, please let me know. Working on providing clear questions.

Astro_raf
  • 57
  • 5

1 Answers1

0

This is df1 As your dataframe does not had any common values in first and last name, I have add one more row to it

enter image description here

This is df2

enter image description here

# Merging
df = pd.merge(df1,df2,on=['poc_firstname','poc_lastname'], how='left')

Let me know if this solves your problem

enter image description here

  • Sorry for that, I just assume everyone knew there were common values among both dataframes. When I try to do this with `phone` it says 'ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat'. Can you do this with merge? – Astro_raf Oct 17 '22 at 20:44
  • Unfortunately, this does not work. I get all the rows for df1. – Astro_raf Oct 17 '22 at 20:45
  • If you what only those rows where the values match, then use `how = 'inner' `in pd.merge. It will disply only those rows which are matched. Check the datatype of the columns in both dataframe. Change the datatype of phone to `int` – Prathamesh Sawant Oct 18 '22 at 04:06