remove dataframe from another without removing duplicates

Asked Jul 12 '18 at 09:17

Active Jul 12 '18 at 16:54

Viewed 37 times

i have a dataframe df1 of size [21,4], and a dataframe df2 of size [10200,4]. I wish to remove the values of df1 from df2 so that its size is [10179,4]

I have seen many posts using the drop duplicates function, however i do not want to drop any duplicates in the df2 dataframe, i only want to remove the df1 values. i have tried

result=df1[~df1[['decel','accel','corner','vert']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\.reset_index(drop=True)

but with no success! many thanks for all your help

UPDATE: Using the code:

Xfinal = pd.merge(X, dropthese, on=['decel','accel','corner','vert'], how='outer', indicator=True).query("_merge != 'both'").drop('_merge', axis=1)

allows me to remove df1 from df2, however reorders df2, grouping similar values. Is there a way to keep the order the same? thanks

edited Jul 12 '18 at 10:02

asked Jul 12 '18 at 09:17

milo204

If solution not working, can you add [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) ? – jezrael Jul 12 '18 at 09:20
Could you sort on the index of `df2` after you've removed the `df1` values to maintain the order? Or create some kind of key indicating the order you want to preserve before doing the merge? – vielkind Jul 12 '18 at 17:01

remove dataframe from another without removing duplicates

0 Answers0