2

I am totally new to Python and just learning with some use cases I have.

I have 2 Data Frames, one is where I need the values in the Country Column, and another is having the values in the column named 'Countries' which needs to be mapped in the main Data Frame referring to the column named 'Data'. (Please accept my apology if this question has already been answered)

Below is the Main DataFrame:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas   |
Divya london Khosla           |
new delhi Pragati Kumari      |
Will London Turner            |
Joseph Mascurenus Bombay      |
Jason New York Bourne         |
New york Vice Roy             |
Joseph Mascurenus new York    |
Peter Parker California       |
Bruce (istanbul) Wayne        |

Below is the Referenced DataFrame:

Data           | Countries
-------------- | ---------
las Vegas      | US
london         | UK
New Delhi      | IN
London         | UK
bombay         | IN
New York       | US
New york       | US
new York       | US
California     | US
istanbul       | TR
Moscow         | RS
Cape Town      | SA

And what I want in the result will look like below:

Name Data                     | Country
----------------------------- | ---------
Arjun Kumar Reddy las Vegas   | US
Divya london Khosla           | UK
new delhi Pragati Kumari      | IN
Will London Turner            | UK
Joseph Mascurenus Bombay      | IN
Jason New York Bourne         | US
New york Vice Roy             | US
Joseph Mascurenus new York    | US
Peter Parker California       | US
Bruce (istanbul) Wayne        | TR

Please note, Both the dataframes are not same in size. I though of using map or Fuzzywuzzy method but couldn't really achieved the result.

Aakash
  • 39
  • 10
  • 5
    Can you post your dataframes instead of screenshots? – ignoring_gravity Jan 06 '20 at 16:20
  • 1
    Does this answer your question? [How to map one dataframe to another (python pandas)?](https://stackoverflow.com/questions/56969708/how-to-map-one-dataframe-to-another-python-pandas) – APhillips Jan 06 '20 at 16:20
  • typically you would do this by having one column that has the same data between the two dataframes (in this case a city column) and doing a pd.merge. your data is messy though so you need a way to deal with that. – Stael Jan 06 '20 at 16:24
  • @ignoring_gravity Thanks for your quick reply I have pasted it as text now – Aakash Jan 06 '20 at 17:19

2 Answers2

2

Find the country key that matches in the reference dataframe and extract it.

regex = '(' + ')|('.join(ref_df['Data']) + ')'
df['key'] = df['Name Data'].str.extract(regex, flags=re.I).bfill(axis=1)[0]

>>> df
                     Name Data        key
0  Arjun Kumar Reddy las Vegas  las Vegas
1       Bruce (istanbul) Wayne   istanbul
2   Joseph Mascurenus new York   new York


>>> ref_df
        Data Country
0  las Vegas      US
1   new York      US
2   istanbul      TR

Merge both the dataframes on key extracted.

pd.merge(df, ref_df, left_on='key', right_on='Data')
                     Name Data        key       Data Country
0  Arjun Kumar Reddy las Vegas  las Vegas  las Vegas      US
1       Bruce (istanbul) Wayne   istanbul   istanbul      TR
2   Joseph Mascurenus new York   new York   new York      US
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
  • Just one more question related to this. This extracts other useless data as well. Is there a way I can use "Word boundaries" with this regex? Please help. Thanks in advance. – Aakash Jan 07 '20 at 04:42
  • `'(\b' + '\b)|(\b'.join(ref_df['Data']) + '\b)'` With word boundaries. @Aakash – Vishnudev Krishnadas Jan 07 '20 at 06:23
1

It looks like everything is sorted so you can merge on index

mdf.merge(rdf, left_index=True, right_index=True)

Kenan
  • 13,156
  • 8
  • 43
  • 50
  • Sorry, if I put it in a wrong way. But the dataframes are not identical, neither sorted the same way. Though, I have tried what you answered but still not getting the desired result. Thanks again. – Aakash Jan 06 '20 at 17:29