Finding duplicate rows (combinations) and merge and sum

Question

I have a sample dataframe like this:

data = {'From': ['1', '1', '2', '2'],
        'To': ['2', '4', '1', '7'],
        'Total': [100, 100, 100, 100]
        }

dataf = pd.DataFrame(data, columns = ['From', 'To', 'Total'])

Which looks like this:

The first and third row are duplicated, but just in a different order i.e. (1,2) and (2,1).

I want to remove the duplicated rows, but sum the values onto which ever row is left.

i.e. the table would look like this:

I have found posts that discuss how to remove duplications that are in any column order (eg. Grouping by multiple columns to find duplicate rows pandas), but nothing about how to sum the values before removing.

I was ready to post an answer that gets you the result, right before it was closed. Let me know if you need it, it works. — sophocles, Jan 27 '21 at 11:18
Thank you sophods, going to check Marias answer and will let you know if I do :) edit: Marias answer does what I need. Thank you for your offer of help though! — Nicholas, Jan 27 '21 at 11:25

score 1 · Accepted Answer · answered Jan 27 '21 at 11:14

Try creating some key for grouping and summing:

dataf['key'] = dataf.apply(lambda row: tuple(sorted((row['From'], row['To']))), axis=1)
dataf = dataf.groupby('key')['Total'].sum().reset_index()
dataf['From'] = dataf['key'].apply(lambda x: x[0])
dataf['To'] = dataf['key'].apply(lambda x: x[1])
del dataf['key']

Finding duplicate rows (combinations) and merge and sum

1 Answers1