I have a somewhat big pandas dataframe (100,000x9). The first two columns are a combination of names associated with a value (in both sides). I want to delete the lower value associated with a given combination.
I haven't tried anything yet, because I'm not sure how to tackle this problem. My first impression is that I need to use the apply function over the data frame, but I need to select each combination of 'first' and 'second', compare them and then delete that row.
df = pd.DataFrame(np.array([['John','Mary',5],['John','Mark',1], ['Mary','John',2], ['Mary','Mark',1], ['Mark','John',3], ['Mark','Mary',5]]), columns=['first','second','third'])
df
first second third
0 John Mary 5
1 John Mark 1
2 Mary John 2
3 Mary Mark 1
4 Mark John 3
5 Mark Mary 5
My objective is to get this data frame
df_clean = pd.DataFrame(np.array([['John','Mary',5], ['Mark','John',3], ['Mark','Mary',5]]), columns=['first','second','third'])
df_clean
first second third
0 John Mary 5
1 Mark John 3
2 Mark Mary 5
Any ideas?