1

I'm simply trying to remove duplicates from a csv and then make a new csv file with only the first column and no duplicates.

My terminal shows its working but when then the new csv file still shows all. ???

import pandas as pd
import numpy as np

#df = pd.read_csv('directory.csv',index_col=0,usecols=["From"]),
d = pd.read_csv('directory.csv')
df = pd.DataFrame(d, columns=['From'])


print(
    """
    
    
-----this is all phone numbers in header FROM-----


    """
)

print(df)
print(
    """


-----this is only unique values ----


    """
)

df = df.drop_duplicates(subset="From", keep="first", inplace=True)
print(df)

print(
    """


-----now saving to new csv----


    """
)

df.to_csv("uniquePhones.csv")

Terminal python3 csvImport.py

-----this is all phone numbers in header FROM-----

                              From
0       +34141414)
1      1231231231
2       1231213
3                  (+123123123
4       123212313..                             ...
692    1231237)
693  A123213616)
694    12321433)
695    1312)
696  1321321)

[697 rows x 1 columns]

-----this is only unique values ----

                              From
0       +34141414)
1      1231231231
2       1231213
3                  (+123123123
4       123212313.. 
692    1231237)
693  A123213616)
694    12321433)
695    1312)
696  1321321)

[279 rows x 1 columns]

-----now saving to new csv----

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Pheng Vue
  • 11
  • 1
  • You can not use inplace=True and re-assign to a variable. If you use inplace=True the return will be None. – Scott Boston Apr 01 '22 at 18:32
  • `df = df.drop_duplicates(subset="From", keep="first", inplace=True)` is incorrect. Use inplace=False OR remove 'df =' from in front. – Scott Boston Apr 01 '22 at 18:33
  • I changed it to false it works but the CSV file still shows all rows. – Pheng Vue Apr 01 '22 at 18:40
  • -----this is only unique values ---- From 0 +234131 1 Ja134134131231) 3 32434234314 5 13414241 6 3413131) .. ... 689 12341231231 691 1311312 693 1231231231) 694 123213213 695 132131 [279 rows x 1 columns] -----now saving to new csv---- – Pheng Vue Apr 01 '22 at 18:42
  • My terminal says lists all the rows then has this [279 rows x 1 columns] is it not replacing df with the new set? – Pheng Vue Apr 01 '22 at 18:43
  • 1
    This works. I think it was the dataset I was using. import pandas as pd import numpy as np d = pd.read_csv('us-500.csv') df = pd.DataFrame(d, columns=['phone1']) print( """ -----this is all phone numbers in header Phone1----- """ ) print(df) print( """ -----this is only unique values ---- """ ) df = df.drop_duplicates(keep="first") print(df) print( """ -----now saving to new csv---- """ ) df.to_csv("uniquePhones.csv", index=True) – Pheng Vue Apr 01 '22 at 18:51
  • 1
    Solved it! The code works, it was just the type of CSV file I was using. When it was converted from Excel, it was some random UTF-8 CSV type. I made it to a plain CSV file and it all worked fine. Thank you for the help! – Pheng Vue Apr 01 '22 at 19:06
  • I'm happy it worked for. Happy coding. Be safe and stay healthy. – Scott Boston Apr 01 '22 at 19:13

1 Answers1

0

Had the same error, fixed it by doing:

df = df.drop_duplicates().reset_index()
df.to_csv() # Now works