1

I have a series whose entries are sets. I want to remove all duplicate entries, using pandas.Series.drop_duplicates() but get an error. Here is an example:

import pandas as pd
ser = pd.Series([{1,2,3}, {4,5,6}, {4,5,6}])
ser.drop_duplicates()

The last line gives the following exception:

TypeError: unhashable type: 'set'

Whereas I would like to get:

0    {1, 2, 3}
1    {4, 5, 6}

Is this a bug? Or is there another way to acheive this?

splinter
  • 3,727
  • 8
  • 37
  • 82

1 Answers1

4

Let us using astype(str) then duplicated

ser[~ser.astype(str).duplicated(keep='first')]
Out[170]: 
0    {1, 2, 3}
1    {4, 5, 6}
dtype: object

More Info :

ser.astype(str).duplicated(keep='first')
Out[171]: 
0    False
1    False
2     True
dtype: bool
BENY
  • 317,841
  • 20
  • 164
  • 234