One situation I sometimes encounter is, I have two dataframes (df1
, df2
) and I want to create a new dataframe (df3
) based on the intersection of multiple columns between df1
and df2
.
For example, I want to create df3
by filtering df1
by columns Campaign
and Group
.
import pandas as pd
df1 = pd.DataFrame({'Campaign':['Campaign 1', 'Campaign 2', 'Campaign 3', 'Campaign 3', 'Campaign 4'], 'Group':['Some group', 'Arbitrary Group', 'Group 1', 'Group 2', 'Done Group'], 'Metric':[245,91,292,373,32]}, columns=['Campaign', 'Group', 'Metric'])
df2 = pd.DataFrame({'Campaign':['Campaign 3', 'Campaign 3'], 'Group':['Group 1', 'Group 2'], 'Metric':[23, 456]}, columns=['Campaign', 'Group', 'Metric'])
df1
Campaign Group Metric
0 Campaign 1 Some group 245
1 Campaign 2 Arbitrary Group 91
2 Campaign 3 Group 1 292
3 Campaign 3 Group 2 373
4 Campaign 4 Done Group 32
df2
Campaign Group Metric
0 Campaign 3 Group 1 23
1 Campaign 3 Group 2 456
I know I can do this with merge...
df3 = df1.merge(df2, how='inner', on=['Campaign', 'Group'], suffixes=('','_del'))
#df3
Campaign Group Metric Metric_del
0 Campaign 3 Group 1 292 23
1 Campaign 3 Group 2 373 456
but then I have to figure out how to drop
columns that end with _del
. I guess this:
df3.select(lambda x: not re.search('_del', x), axis=1)
##The result I'm going for but required merge, then select (2-steps)
Campaign Group Metric
0 Campaign 3 Group 1 292
1 Campaign 3 Group 2 373
Questions
What I'm mainly interested in is returning df1
that's simply filtered on df2
's Campaign|Group
values.
Is there a better way to return
df1
without resorting tomerge
?Is there a way to
merge
but NOT returndf2
's columns to themerge
and returning onlydf1
's columns?