0

I have panada dataframe predictions which consists of three columns. I created this dataframe using three memmap array.

    predictions = pd.dataframe{'cell': list_1, 'tree': list_2, 'predict': list_3, 'label': list_4}

Now I wanna groupby on two columns of this dataframe and average on third column as follows:

    df = predictions.groupby(['tree', 'cell'])['list3'].mean()

But it gives me the error which says the memmap array is unhashable! and it can not perform groupby. I really need to do groupby otherwise I have to do two for loop which takes forever because my dictionary has 1,000,000 rows. I'm wondering does anybody know the solution? Thanks

Edited cell and tree columns are lists of items frommemmap array. predict and label are just normal lists. The list of memmap array items looks like: cell

[memmap([415], dtype=int32), 
memmap([143], dtype=int32), 
memmap([96],  dtype=int32), 
memmap([432], dtype=int32), 
memmap([104], dtype=int32), 
memmap([76], dtype=int32), 
memmap([312], dtype=int32), 
memmap([143], dtype=int32), 
memmap([312], dtype=int32), 
memmap([64], dtype=int32),
memmap([296], dtype=int32)]

The prediction dataframe is look like this:

      cell  label  predict  tree
0    [415]      0        1  [19]
1    [143]      1        1  [22]
2     [96]      0        1  [19]
3    [432]      1        1  [12]
4    [104]      0        1  [21]
5     [76]      0        1  [19]
6    [312]      1        1  [22]
7    [143]      1        1  [22]
8    [312]      1        1  [22]
9     [64]      0        1  [18]
10   [296]      1        1  [22]

I get following error:

predictions_target = predictions.groupby(['tree', 'cell'])    ['predict'].mean()
File "/usr/venv/local/lib/python2.7/site-packages/pandas    /core/groupby.py", line 1015, in mean
return self._python_agg_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 826, in _python_agg_general
return self._python_apply_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 698, in _python_apply_general
self.axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1577, in apply
splitter = self._get_splitter(data, axis=axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1563, in _get_splitter
comp_ids, _, ngroups = self.group_info
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:44222)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1670, in group_info
comp_ids, obs_group_ids = self._get_compressed_labels()
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1677, in _get_compressed_labels
all_labels = [ping.labels for ping in self.groupings]
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2308, in labels
self._musr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2319, in _make_labels
labels, uniques = algos.factorize(self.grouper, sort=self.sort)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/algorithms.py", line 313, in factorize
labels = table.get_labels(vals, uniques, 0, na_sentinel, True)
File "pandas/src/hashtable_class_helper.pxi", line 843, in     pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:14831)
TypeError: unhashable type: 'memmap'
ga97rasl
  • 307
  • 2
  • 7
  • 15

0 Answers0