5

all,

I have a column in a dataframe that looks like this:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

allHoldingsFund['BrokerMixed'].unique()

I get an error

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

Any help is welcome. Thank you

SBad
  • 1,245
  • 5
  • 23
  • 36

3 Answers3

5

You have an array in your data column, you could try the following

allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()
Sahil Puri
  • 491
  • 3
  • 12
  • what do you get as the error ?. Can you answer my comment on your question. – Sahil Puri Aug 03 '18 at 14:58
  • `File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique TypeError: unhashable type: 'numpy.ndarray'` – SBad Aug 03 '18 at 14:59
  • error `AttributeError: 'StringMethods' object has no attribute 'unique' ` – SBad Aug 03 '18 at 15:11
  • 1
    I can only suggest that you look at the variable `a=allHoldingsFund.loc[78, 'BrokerMixed']` and try to convert it into a string, then apply the function to your column and take the unique value. – Sahil Puri Aug 03 '18 at 15:19
5

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

First I would suggest you to check what's type of your column. You may try as follows

print (type(allHoldingsFund['BrokerMixed']))

If this is a dataframe series, you may try

allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()

and check if this works for you.

EDIT 2020 : Your way to get unique and mentioned answers fetch same results using Python 3

enter image description here

Hari_pb
  • 7,088
  • 3
  • 45
  • 53
  • Thanks Harry_pb `type(allHoldingsFund['BrokerMixed']) Out[423]: pandas.core.series.Series ` your code `allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()` gives an error. how can I get rid of the array and make the whole column an object please? – SBad Aug 03 '18 at 14:56