1

This is my original data frame

I want to remove the duplicates for the columns 'head_x' and 'head_y' and the columns 'cost_x' and 'cost_y'.

This is my code:

df=df.astype(str)

df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

print(df)

This is the output dataframe, as you can see the first row is a duplicate on both subsets. So why is this row stil there?

I do not just want to remove the first row but all duplicates. Tis is another output where also for Index/Node 6 there is a duplicate.

KristelK
  • 13
  • 3
  • Please do not post images of data. This makes it more difficult for people to help you! Just paste it in as text. Take a look at this: https://stackoverflow.com/help/how-to-ask – Dave May 06 '20 at 13:40

2 Answers2

0
df=df.astype(str)

df = df.drop_duplicates(subset={'head_x','head_y'}, keep=False, inplace=True)

df = df.drop_duplicates(subset={'cost_x','cost_y'}, keep=False, inplace=True)

I assume that cost_x should be replaced with head_y, in other way there are no duplicates

sygneto
  • 1,761
  • 1
  • 13
  • 26
0

Take a look at the first 2 rows:

      head_x  cost_x  head_y  cost_y
Node
1          2       6       2       3
1          2       6       3       4

Start from head_x and head_y:

  • from the first row are 2 and 2,
  • from the second row are 2 and 3,

so these two pairs are different.

Then look at cost_x and cost_y:

  • from the first row are 6 and 3,
  • from the second row are 6 and 4,

so these two pairs are also different.

Conclusion: These 2 rows are not duplicates, taking into account both column subsets.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41