I have a DataFrame with multiple columns:
#list
list = [('Bass', "Albert", 15), ('Bass', "Daniel", 12), ('Bass', "Paul", 31),
('Bass', "Tony", 11), ('Palmer', "Albert", 12), ('Palmer', "Daniel", 22),
('Palmer', "Paul", 30), ('Palmer', "Tony", 50), ('Smith', "Albert", 29),
('Smith', "Daniel", 9), ('Smith', "Paul", 31), ('Smith', "Tony", 24)]
# Create a DataFrame object
import pandas as pd
df = pd.DataFrame(list, columns =['Surname', 'Name', 'age'])
I would like for each Surname
to divide the values in the age
column by the age
of Paul
(i.e., for the surname Bass, divide the values by 31, then for Palmer divide them by 30, and so on ) and store these values in a new column called age_normalized
and have these results match with the corresponding age_initial
df = df.set_index("Surname")
df2 = df.loc[df.Name =="Paul"]
At this point is time to divide these values and merge the dataframes
results = df["age"]/df2['age']
merge_df = df.merge(results, left_index=True, right_index=True,
suffixes = ("_initial", "_normalized"))
Here is my problem: when I print
the merge I have 4 results for each name instead of one
print(results.head())
Surname
Bass 0.483871
Bass 0.387097
Bass 1.000000
Bass 0.354839
Palmer 0.300000
print(df_merge.Head())
Name age_initial age_normalized
Surname
Bass Albert 15 0.483871
Bass Albert 15 0.387097
Bass Albert 15 1.000000
Bass Albert 15 0.354839
Bass Daniel 12 0.483871
How can I merge the two dataframes and make the initial age match with the corresponding normalized age?