0

I have two dataframes: df1 & df2 that have 30 columns each. I have a set of 7 columns that are filled with np.nan in df1. I want to use the entries from df2 within those same 7 columns to fill in the df1 nan's within their respective columns. In order to make sure that df1's nan's are filled in correctly I would like to match up on a unique identifier (available in both df1 and df2) but keep in mind it is not the index since df1 has multiple repeats of this identifier. One thing I ran into is that my methods are only allowing for one fill which is not what I want.

EDIT:

First, here are the columns that I want to eventually fill in:


cols = ['Analytics Source 1', 'User ID', 'User Email', 'Category', 'Source Title', 'Title', 'Date Created', 'Date Effective Start', 'Date Effective End']

Next, I made a dataframe of all unique identifiers, the only difference between the actual dataframe df and the one below is that df has a high number of repeated unique identifiers

df_conn = df[df['Principal Type'] != 'user']
df_conn = df_conn.drop_duplicates(subset='Notification ID')

Next, I want to fill in df with the values from df_conn which should (in theory) populate df throughout, no matter how many repeated unique identifiers there are in df.

df_result = df.set_index('Notification ID').combine_first(df_conn.set_index('Notification ID'))
df_result = df_result.reset_index()
Community
  • 1
  • 1
Matt_Davis
  • 259
  • 4
  • 16

0 Answers0