0

I have a sample dataframe like this:

data = {'From': ['1', '1', '2', '2'],
        'To': ['2', '4', '1', '7'],
        'Total': [100, 100, 100, 100]
        }

dataf = pd.DataFrame(data, columns = ['From', 'To', 'Total'])

Which looks like this:

enter image description here

The first and third row are duplicated, but just in a different order i.e. (1,2) and (2,1).

I want to remove the duplicated rows, but sum the values onto which ever row is left.

i.e. the table would look like this:

enter image description here

I have found posts that discuss how to remove duplications that are in any column order (eg. Grouping by multiple columns to find duplicate rows pandas), but nothing about how to sum the values before removing.

Nicholas
  • 3,517
  • 13
  • 47
  • 86
  • 1
    I was ready to post an answer that gets you the result, right before it was closed. Let me know if you need it, it works. – sophocles Jan 27 '21 at 11:18
  • 1
    Thank you sophods, going to check Marias answer and will let you know if I do :) edit: Marias answer does what I need. Thank you for your offer of help though! – Nicholas Jan 27 '21 at 11:25

1 Answers1

1

Try creating some key for grouping and summing:

dataf['key'] = dataf.apply(lambda row: tuple(sorted((row['From'], row['To']))), axis=1)
dataf = dataf.groupby('key')['Total'].sum().reset_index()
dataf['From'] = dataf['key'].apply(lambda x: x[0])
dataf['To'] = dataf['key'].apply(lambda x: x[1])
del dataf['key']
maria
  • 494
  • 4
  • 13