How to remove rows from pandas dataframe that contain combinations of the first two columns

Question

I am trying to get rid of the rows that contain element combinations of the first two columns in a pandas dataframe, for instance, in the next df:

     event1    event2    uccs   ulags
0  327423.0  329243.0  0.1663 -0.6013
1  327423.0  329589.0  0.1911 -0.4730
2  329243.0  327423.0  0.1663  0.6013
3  329243.0  329589.0  0.3101 -0.7502
4  329589.0  327423.0  0.1911  0.4730
5  329589.0  329243.0  0.3101  0.7502

rows 0 and 2 present a combination of elements: event1 and event2. That is:

0  327423.0  329243.0
2  329243.0  327423.0

In general, I need to reduce the matrix or df to 3 rows, removing all duplicates.

I have tried the next without success:

 u = df.filter(like='event1').values
 m = pd.DataFrame(np.sort(u, axis=1)).duplicated()

doing that I get:

    event1    event2    uccs   ulags
0  327423.0  329243.0  0.1663 -0.6013
2  329243.0  327423.0  0.1663  0.6013
4  329589.0  327423.0  0.1911  0.4730

However as you can see, rows 0 and 2 are duplicated. Any tips about how to do this in pandas or numpy will be appreciated.

Erfan · Accepted Answer · 2019-07-03T20:50:05.840

2

If I understand you correctly, you were quite close, you just have to pass your boolean array back to your dataframe for boolean indexing:

df[~pd.DataFrame(np.sort(df.filter(like='event'), axis=1)).duplicated()]

     event1    event2    uccs   ulags
0  327423.0  329243.0  0.1663 -0.6013
1  327423.0  329589.0  0.1911 -0.4730
3  329243.0  329589.0  0.3101 -0.7502

Notice the ~ I used which is inverse or the not operator:

m = pd.DataFrame(np.sort(df.filter(like='event'), axis=1)).duplicated()

print(m)

0    False
1    False
2     True
3    False
4     True
5     True
dtype: bool

Now inverse:

print(~m)

0     True
1     True
2    False
3     True
4    False
5    False
dtype: bool

edited Jul 03 '19 at 20:50

answered Jul 03 '19 at 20:36

Erfan

40,971
8
66
78

Thank you very much Erfan. This solves my problem. I appreciate it! – echaves Jul 03 '19 at 20:47
Glad I could help. Added a small but important explanation just for your convenience. @echaves – Erfan Jul 03 '19 at 20:50
Nice answer (-: – piRSquared Jul 03 '19 at 21:13
Compliment from mr pirsquared, im flattered, thank you sir @piRSquared – Erfan Jul 03 '19 at 21:16

How to remove rows from pandas dataframe that contain combinations of the first two columns

1 Answers1