I have two dataframes: df1 & df2 that have 30 columns each. I have a set of 7 columns that are filled with np.nan in df1. I want to use the entries from df2 within those same 7 columns to fill in the df1 nan's within their respective columns. In order to make sure that df1's nan's are filled in correctly I would like to match up on a unique identifier (available in both df1 and df2) but keep in mind it is not the index since df1 has multiple repeats of this identifier. One thing I ran into is that my methods are only allowing for one fill which is not what I want.
EDIT:
First, here are the columns that I want to eventually fill in:
cols = ['Analytics Source 1', 'User ID', 'User Email', 'Category', 'Source Title', 'Title', 'Date Created', 'Date Effective Start', 'Date Effective End']
Next, I made a dataframe of all unique identifiers, the only difference between the actual dataframe df
and the one below is that df
has a high number of repeated unique identifiers
df_conn = df[df['Principal Type'] != 'user']
df_conn = df_conn.drop_duplicates(subset='Notification ID')
Next, I want to fill in df
with the values from df_conn
which should (in theory) populate df
throughout, no matter how many repeated unique identifiers there are in df
.
df_result = df.set_index('Notification ID').combine_first(df_conn.set_index('Notification ID'))
df_result = df_result.reset_index()