df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

Question

all,

I have a column in a dataframe that looks like this:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

allHoldingsFund['BrokerMixed'].unique()

I get an error

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

Any help is welcome. Thank you

Can you please show a short sample of your overall dataframe where you have "BrokerMixed" column? — user2906838, Aug 03 '18 at 14:49
can you show the output for `print allHoldingsFund.loc[78, 'BrokerMixed']` — Sahil Puri, Aug 03 '18 at 14:49
@Sahil `allHoldingsFund.loc[78, 'BrokerMixed'] Out[422]: array('ML', dtype=' — SBad, Aug 03 '18 at 14:51
Try now `print allHoldingsFund.loc[78, 'BrokerMixed'][0]` and`print type(allHoldingsFund.loc[78, 'BrokerMixed'][0])` — Sahil Puri, Aug 03 '18 at 14:53

Sahil Puri · Answer 1 · 2018-08-03T15:12:49.097

5

You have an array in your data column, you could try the following

allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()

edited Aug 03 '18 at 15:12

answered Aug 03 '18 at 14:54

Sahil Puri

491
3
12

what do you get as the error ?. Can you answer my comment on your question. – Sahil Puri Aug 03 '18 at 14:58
`File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique TypeError: unhashable type: 'numpy.ndarray'` – SBad Aug 03 '18 at 14:59
error `AttributeError: 'StringMethods' object has no attribute 'unique' ` – SBad Aug 03 '18 at 15:11
1

I can only suggest that you look at the variable `a=allHoldingsFund.loc[78, 'BrokerMixed']` and try to convert it into a string, then apply the function to your column and take the unique value. – Sahil Puri Aug 03 '18 at 15:19

score 5 · Answer 2 · answered Aug 03 '18 at 14:59

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

thanks jpp i think i ll do some code change to make sure the column is only object as opposed to a mix of object and array — SBad, Aug 03 '18 at 15:04

Hari_pb · Accepted Answer · 2020-03-23T17:21:41.473

0

First I would suggest you to check what's type of your column. You may try as follows

print (type(allHoldingsFund['BrokerMixed']))

If this is a dataframe series, you may try

allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()

and check if this works for you.

EDIT 2020 : Your way to get unique and mentioned answers fetch same results using Python 3

edited Mar 23 '20 at 17:21

answered Aug 03 '18 at 14:52

Hari_pb

7,088
3
45
53

Thanks Harry_pb `type(allHoldingsFund['BrokerMixed']) Out[423]: pandas.core.series.Series ` your code `allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()` gives an error. how can I get rid of the array and make the whole column an object please? – SBad Aug 03 '18 at 14:56

df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

3 Answers3

Linked