How can I avoid getting extra non matching values after merging dataframes?

Question

I have a DataFrame with multiple columns:

#list
list = [('Bass', "Albert", 15), ('Bass', "Daniel", 12), ('Bass', "Paul", 31),
('Bass', "Tony", 11), ('Palmer', "Albert", 12), ('Palmer', "Daniel", 22), 
('Palmer', "Paul", 30), ('Palmer', "Tony", 50), ('Smith', "Albert", 29),
('Smith', "Daniel", 9), ('Smith', "Paul", 31), ('Smith', "Tony", 24)] 
# Create a DataFrame object 
import pandas as pd
df = pd.DataFrame(list, columns =['Surname', 'Name', 'age'])

I would like for each Surname to divide the values in the age column by the age of Paul (i.e., for the surname Bass, divide the values by 31, then for Palmer divide them by 30, and so on ) and store these values in a new column called age_normalized and have these results match with the corresponding age_initial

df = df.set_index("Surname")
df2 = df.loc[df.Name =="Paul"]

At this point is time to divide these values and merge the dataframes

  results = df["age"]/df2['age']
    merge_df = df.merge(results, left_index=True, right_index=True,
 suffixes = ("_initial", "_normalized"))

Here is my problem: when I print the merge I have 4 results for each name instead of one

print(results.head())
Surname
Bass      0.483871
Bass      0.387097
Bass      1.000000
Bass      0.354839
Palmer    0.300000
print(df_merge.Head())
          Name  age_initial  age_normalized
Surname                                     
Bass     Albert           15        0.483871
Bass     Albert           15        0.387097
Bass     Albert           15        1.000000
Bass     Albert           15        0.354839
Bass     Daniel           12        0.483871

How can I merge the two dataframes and make the initial age match with the corresponding normalized age?

You haven't tagged a programming language in your question or shown us if you've made any attempt to solve this yourself. — David Buck, Nov 12 '20 at 16:04
Welcome to Stackoverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — piterbarg, Nov 12 '20 at 20:13

score 0 · Answer 1 · answered Jan 06 '21 at 13:31

0

To solve the issue, I have merged df with df2 first, instead of results and then did the division

df2_area = df2["age"]
mer = df.merge(df2_area, on="Surname")
mer[["age_norm"]] = mer[['age_x']].div(mer['age_y'].values, axis=0)

answered Jan 06 '21 at 13:31

Fauz

1
1

How can I avoid getting extra non matching values after merging dataframes?

1 Answers1