Checking for duplicates in a series/dict

Question

I have the following Series which is the result of using Stack on a DataFrame to result in the desired output:

col1  col2
A     GS      0.522696
F     GS      0.422812
GS    A       0.522696
      F       0.422812

In the above example, the rows (A,GS) = 0.522696 and (GS,A) = 0.522696 are considered to be the same so I need to filter out one of them. The same goes for (F,GS) = 0.422812 and (GS,F) = 0.422812.

Essentially what is happening is that every row will be duplicated in the sense that col1 and col2 will be reversed, but the corresponding float value is the same. (ie: GS,F is a duplicate of F,GS). I therefore need to filter out the 'duplicate'. It doesn't matter which one gets filtered out, I just need the result of the above example to only include two rows.

I've tried to change the structure into a dict just to see if it will be easier to work with, ie: Series.to_dict(), which results in:

{('GS', 'F'): 0.422812, ('A', 'GS'): 0.522696,
('F', 'GS'): 0.422812, ('GS', 'A'): 0.522696}

But I still haven't had any luck, regardless of it is in a series or dict.

To remove duplicate values from a dict, just iterate through its key, value pairs and add them to a new dict only if the value is not already in its values. For code to do this see answer by Andrew Cox in http://stackoverflow.com/questions/8749158/removing-duplicates-from-dictionary. — , Aug 01 '15 at 09:50
Duplicated pairs can be avoid at first place if you select correlation coefficients only from the upper or lower tri matrix. — Jianxun Li, Aug 01 '15 at 10:00

score 0 · Answer 1 · answered Aug 01 '15 at 09:51

0

You can delete duplicates in the dict:

result_dict = Series.to_dict()
for elem in Series:
    if elem in result_dict:
        s_elem1, s_elem0 = elem
        del result_dict[(s_elem0, s_elem1)]

answered Aug 01 '15 at 09:51

301_Moved_Permanently

4,007
14
28

score 0 · Accepted Answer · answered Aug 01 '15 at 13:57

0

You may use dictionary comprehension to avoid value repetition:

new_dict = {v: k for k,v in old_dict.items()}

answered Aug 01 '15 at 13:57

mmachine

896
6
10

Checking for duplicates in a series/dict

2 Answers2