Given a file (yourdata.csv
) containing the data outlined in your question, you could use the below example to replace all null values in the GPA
column with the mean GPA
of the corresponding University
for each row.
This solution makes use of df.apply
(docs) to apply the same conditional logic to each row within the df
.
import pandas as pd
import numpy as np
df = pd.read_csv('yourdata.csv')
df['GPA'].fillna(np.nan, inplace=True) # Ensuring null values in GPA col are np.nan
df['GPA'] = df.apply(
lambda row: df[df['University'] == row['University']]['GPA'].mean()
if np.isnan(row['GPA']) else row['GPA'], axis=1
)
If you only want to perform this logic on rows where University == 'Harvard'
(i.e. not fill null GPA values in Stanford, Oxford etc. rows with their respective mean) you could simply add this as a further condition within the lambda
applied to each row to ensure that only rows where University == 'Harvard'
are affected.
# Same imports, original df and fillna as above snippet
df['GPA'] = df.apply(
lambda row: df[df['University'] == row['University']]['GPA'].mean()
if np.isnan(row['GPA']) and row['University'] == 'Harvard'
else row['gpa'], axis=1
)