0

I have a df of invoices, but only the following two columns really matter

 OrderNum Id . . . . 
    586  270  
    588  270
    590  270
    590  270

Where OrderNum is int64 and Id is also int64

I am trying to drop duplicates Order Numbers but for whatever reason the following code is deleting all of the rows because duplicates exist

df = df.drop_duplicates(subset = ['OrderNum'], keep = 'last',inplace=False)

Don't know if im using the method incorrectly, but i cant seem to figure out why

Expected result:

OrderNum Id . . . 
    586  270
    588  270
    590  270
S44
  • 473
  • 2
  • 10

1 Answers1

0

The following worked for me in an ipython shell using the command you shared in your OP

In [1]: import pandas as pd

In [2]: test_df = pd.DataFrame({'orderNumber': {468: 586491, 472: 590378, 476: 5
   ...: 90378, 480: 588237}, 'customerId': {468: 27037, 472: 27037, 476: 27037,
   ...: 480: 27037}})

In [3]: test_df
Out[3]:
     orderNumber  customerId
468       586491       27037
472       590378       27037
476       590378       27037
480       588237       27037

In [4]: test_df = test_df.drop_duplicates(subset=['orderNumber'], keep='last')

In [5]: test_df
Out[5]:
     orderNumber  customerId
468       586491       27037
476       590378       27037
480       588237       27037
oh_my_lawdy
  • 449
  • 4
  • 15