1

I have the following Series which is the result of using Stack on a DataFrame to result in the desired output:

col1  col2
A     GS      0.522696
F     GS      0.422812
GS    A       0.522696
      F       0.422812

In the above example, the rows (A,GS) = 0.522696 and (GS,A) = 0.522696 are considered to be the same so I need to filter out one of them. The same goes for (F,GS) = 0.422812 and (GS,F) = 0.422812.

Essentially what is happening is that every row will be duplicated in the sense that col1 and col2 will be reversed, but the corresponding float value is the same. (ie: GS,F is a duplicate of F,GS). I therefore need to filter out the 'duplicate'. It doesn't matter which one gets filtered out, I just need the result of the above example to only include two rows.

I've tried to change the structure into a dict just to see if it will be easier to work with, ie: Series.to_dict(), which results in:

{('GS', 'F'): 0.422812, ('A', 'GS'): 0.522696,
('F', 'GS'): 0.422812, ('GS', 'A'): 0.522696} 

But I still haven't had any luck, regardless of it is in a series or dict.

darkpool
  • 13,822
  • 16
  • 54
  • 89
  • To remove duplicate values from a dict, just iterate through its key, value pairs and add them to a new dict only if the value is not already in its values. For code to do this see answer by Andrew Cox in http://stackoverflow.com/questions/8749158/removing-duplicates-from-dictionary. –  Aug 01 '15 at 09:50
  • 1
    You may use new_dict = {v: k for k,v in dict.items()} – mmachine Aug 01 '15 at 09:57
  • 1
    Duplicated pairs can be avoid at first place if you select correlation coefficients only from the upper or lower tri matrix. – Jianxun Li Aug 01 '15 at 10:00

2 Answers2

0

You can delete duplicates in the dict:

result_dict = Series.to_dict()
for elem in Series:
    if elem in result_dict:
        s_elem1, s_elem0 = elem
        del result_dict[(s_elem0, s_elem1)]
301_Moved_Permanently
  • 4,007
  • 14
  • 28
0

You may use dictionary comprehension to avoid value repetition:

new_dict = {v: k for k,v in old_dict.items()} 
mmachine
  • 896
  • 6
  • 10