0

I have a dataframe with 15 columns being used to calculate a score. Two columns (a & b) are my independent variables of which a & b both have duplicate values. Column C represents the score being calculated- of which i have sorted the dataframe by column C descending already. The goal is to keep the highest scored combination of a & b columns and drop any columns after.

Column A Column B Column C
5 10 1.5
5 12 1.4
10 12 1.0
7 14 0.9
7 9 0.8
12 6 0.7
14 4 0.6

In the above example, I would want the second column, third column, fifth column, sixth, and seventh columns all dropped. Sixth and seventh columns would be dropped because 12 and 14 were already included in rows above in columns b.

mashit101
  • 9
  • 3

1 Answers1

1

Use Series.duplicated

res = df[~(df["Column A"].duplicated() | df["Column B"].duplicated())]
print(res)

Output

   Column A  Column B  Column C
0         5        10       1.5
3         7        14       0.9
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • @mashit101 Glad I could help! Please consider marking it as accepted if you found my answer useful – Dani Mesejo Oct 13 '21 at 13:39
  • This is close! It removes all duplicates in their independent columns but doesn't remove the value in column a if its in column b. In the above example, i still have the 3rd row showing when it should be removed because both 10 and 12 are in previous columns – mashit101 Oct 13 '21 at 13:51
  • @mashit101 Sorry but I cannot reproduce I ran the code given in the answer against the data provided in the example and the output obtained was the one also shown – Dani Mesejo Oct 13 '21 at 13:54
  • My example wasnt complete. I added a two more rows and added to the explanation above. Thanks! – mashit101 Oct 14 '21 at 20:03