I need to merge two dataframes without creating duplicate columns. The first datframe (dfa) has missing values. The second dataframe (dfb) has unique values. This would be the same as a vlookup in Excel.
dfa looks like this:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 NaN NaN
LS1 NaN NaN
LS2 NaN NaN
M21 2.4 0.3
dfb only contains unique Postcodes and values where lat and lon were NaN in dfa. It looks like this:
postcode lat lon
LS1 1.4 0.1
LS2 1.5 0.2
The output I would like is:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 1.4 0.1
LS1 1.4 0.1
LS2 1.5 0.2
M21 2.4 0.3
I've tried using pd.merge like so:
outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')
This results in duplicate columns being created:
postcode lat_x lon_x lat_y lat_x ...plus 32 more columns
M20 2.3 0.2 NaN NaN
LS1 NaN NaN 1.4 0.1
LS1 NaN NaN 1.4 0.1
LS2 NaN NaN 1.5 0.2
M21 2.4 0.3 NaN NaN
From this answer I tried using:
output = dfa
for df in [dfa, dfb]:
ouput.update(df.set_index('Postcode'))
But received the "ValueError: cannot reindex from a duplicate axis".
Also from the above answer this does not work:
output.merge(pd.concat([dfa, dfb]), how='left')
There are no duplicate columns but the values in 'Lat' and 'Lon' are still blank.
Is there a way to merge on 'Postcode' without duplicate columns being created; effectively performing a VLOOKUP using pandas?