My dataframe looks like this:
df_in = pd.DataFrame(data={'mol1':['cpd1','cpd2', 'cpd3'], 'mol2': ['cpd2','cpd1', 'cpd4'], 'sim': [0.8,0.8,0.9]})
print(df_in)
mol1 mol2 sim
0 cpd1 cpd2 0.8
1 cpd2 cpd1 0.8
2 cpd3 cpd4 0.9
The pair (cpd1, cpd2) occurs twice although each element does not belong to the same column.
I would like to get rid of these duplicates to end up with this:
df_out = pd.DataFrame(data={'mol1':['cpd1', 'cpd3'], 'mol2': ['cpd2', 'cpd4'], 'sim': [0.8,0.9]})
print(df_out)
mol1 mol2 sim
0 cpd1 cpd2 0.8
1 cpd3 cpd4 0.9
If I ignore the third column, there is a solution describes in Pythonic way of removing reversed duplicates in list, but I have to preserve this column.