0

I have a dataframe with one text column. It has duplicate elements in different rows. I want to eliminate the duplicates. I use df.drop_duplicates(..., inplace=True) but it doesn't seem to work.

How do I solve this?

smci
  • 32,567
  • 20
  • 113
  • 146
eclairs
  • 1,515
  • 6
  • 21
  • 26
  • Did you pass the column? `df.drop_duplicates(col_name)`? By default it drops duplicate rows so only where entire rows are duplicated, if you're only interested in duplicates for a specific column then you need to pass that as the subset to consider only, see the [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates) – EdChum Oct 26 '15 at 16:35
  • @EdChum: The dataframe has only one column. Do I still need to mention the column name? – eclairs Oct 26 '15 at 16:41
  • That should work but now you need to post raw input data, code to create your df, code that demonstrates the error and the desired output because now people have to use a crystal ball – EdChum Oct 26 '15 at 16:43
  • @EdChum: I tried it with column name also, but it did not help. My dataframe contains one column which consists of sentences. When i try to apply drop_duplicates() on a column containing 1 or 2 words, if works fine. But not when it come to sentences. Anything that can be done? – eclairs Oct 27 '15 at 07:35
  • @EdChum: I tried it with column name also, but it did not help. My dataframe contains one column which consists of sentences. When i try to apply drop_duplicates() on a column containing 1 or 2 words, or on a smaller sample of comments, if works fine. But not when it come to the entire dataset(about 300 rows), it does not work. Anything that can be done? – eclairs Oct 27 '15 at 07:41
  • Please see my previous comment about posting data and code that others can use to reproduce your error – EdChum Oct 27 '15 at 09:06
  • @EdChum: the issue is resolved. all I needed to do was to store the non-duplicated data in another dataset. – eclairs Oct 27 '15 at 11:56
  • Don't say "in Python" when you mean "in pandas". And when you say "in pandas", do you mean dataframe, Series, or both? Your question body is about dataframe, but pandas has `.drop_duplicates` for both Series and DataFrame. Anyway, **your question seems to revolve around confusion about whether you wanted to modify the df in-place, or return a modified df and assign it to something else.** – smci May 31 '20 at 02:29

1 Answers1

0

The issue over here was because i was modifying the dataframe on itself. Storing the modified data in another dataset, solved the purpose.

What I was doing:
    df.drop_duplicates()

What had to be done:
    df1=df.drop_duplicates()
eclairs
  • 1,515
  • 6
  • 21
  • 26