DataFrame drop Method is dropping all rows despite selecting subset

Question

I have a df of invoices, but only the following two columns really matter

 OrderNum Id . . . . 
    586  270  
    588  270
    590  270
    590  270

Where OrderNum is int64 and Id is also int64

I am trying to drop duplicates Order Numbers but for whatever reason the following code is deleting all of the rows because duplicates exist

df = df.drop_duplicates(subset = ['OrderNum'], keep = 'last',inplace=False)

Don't know if im using the method incorrectly, but i cant seem to figure out why

Expected result:

OrderNum Id . . . 
    586  270
    588  270
    590  270

Will you please post the df that you're getting to the question? It seems to work for me. — , Dec 20 '21 at 20:17
Can you try `df = df[~df.duplicated('OrderNum', keep='last')]`, please? — Corralien, Dec 20 '21 at 20:24
{'orderNumber': {468: 586491, 472: 590378, 476: 590378, 480: 588237}, 'customerId': {468: 27037, 472: 27037, 476: 27037, 480: 27037}} dict — S44, Dec 20 '21 at 20:25
For your dictionary, your code works. Try to export your dataframe to csv and reload it. — Corralien, Dec 20 '21 at 20:31
I don't think it has to do with datatype. Restart jupyter kernel please, you probably have some misleading variables saved. Code works fine. — Patryk Kowalski, Dec 20 '21 at 20:33

score 0 · Answer 1 · answered Dec 20 '21 at 20:33

The following worked for me in an ipython shell using the command you shared in your OP

In [1]: import pandas as pd

In [2]: test_df = pd.DataFrame({'orderNumber': {468: 586491, 472: 590378, 476: 5
   ...: 90378, 480: 588237}, 'customerId': {468: 27037, 472: 27037, 476: 27037,
   ...: 480: 27037}})

In [3]: test_df
Out[3]:
     orderNumber  customerId
468       586491       27037
472       590378       27037
476       590378       27037
480       588237       27037

In [4]: test_df = test_df.drop_duplicates(subset=['orderNumber'], keep='last')

In [5]: test_df
Out[5]:
     orderNumber  customerId
468       586491       27037
476       590378       27037
480       588237       27037

DataFrame drop Method is dropping all rows despite selecting subset

1 Answers1