0

So in my code I am deleting duplicates. The problem is some of my data has no entry's. because it deletes duplicates the ones with no entrys get deleted. The problem with this is I am running millions of entrys so I couldnt just go in and add a fake entry to the data. I need a line of code that will ignore the blank entrys and not delete them. I am only checking if their are duplicates in a column not a row. Thanks in advance. I am also using PANDAS in this because the data is in CSV files

Array Example: 1,1 2,2 3,3 4,"" 5,5 6,"" 1,1 2,2 what i want to happen to array: 1,1 2,2 3,3 4,"" 5,5 6,"" what actually happens 1,1 2,2 3,3 5,5

`df = df.drop_duplicates(subset = [1])

df = df.drop_duplicates(subset = [2]) df = df.drop_duplicates(subset = [2])`

Ben D
  • 1

1 Answers1

0

You could filter empty rows, drop duplicates and after concat both.

df = pd.DataFrame({'col1': ['1','1 2','2 3','3 4','','5','5 6','','1','1 2','2']})
dfempty = df.loc[df.col1 == ""]
df2 = df.loc[df.col1 != ""].drop_duplicates()
pd.concat([dfempty, df2]).sort_index()


    col1
0   1
1   1 2
2   2 3
3   3 4
4   
5   5
6   5 6
7   

10 2