I have panada dataframe predictions
which consists of three columns. I created this dataframe using three memmap array
.
predictions = pd.dataframe{'cell': list_1, 'tree': list_2, 'predict': list_3, 'label': list_4}
Now I wanna groupby on two columns of this dataframe and average on third column as follows:
df = predictions.groupby(['tree', 'cell'])['list3'].mean()
But it gives me the error which says the memmap array is unhashable! and it can not perform groupby
.
I really need to do groupby
otherwise I have to do two for
loop which takes forever because my dictionary has 1,000,000
rows. I'm wondering does anybody know the solution? Thanks
Edited
cell
and tree
columns are lists of items frommemmap array
. predict
and label
are just normal lists.
The list of memmap array
items looks like:
cell
[memmap([415], dtype=int32),
memmap([143], dtype=int32),
memmap([96], dtype=int32),
memmap([432], dtype=int32),
memmap([104], dtype=int32),
memmap([76], dtype=int32),
memmap([312], dtype=int32),
memmap([143], dtype=int32),
memmap([312], dtype=int32),
memmap([64], dtype=int32),
memmap([296], dtype=int32)]
The prediction dataframe is look like this:
cell label predict tree
0 [415] 0 1 [19]
1 [143] 1 1 [22]
2 [96] 0 1 [19]
3 [432] 1 1 [12]
4 [104] 0 1 [21]
5 [76] 0 1 [19]
6 [312] 1 1 [22]
7 [143] 1 1 [22]
8 [312] 1 1 [22]
9 [64] 0 1 [18]
10 [296] 1 1 [22]
I get following error:
predictions_target = predictions.groupby(['tree', 'cell']) ['predict'].mean()
File "/usr/venv/local/lib/python2.7/site-packages/pandas /core/groupby.py", line 1015, in mean
return self._python_agg_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 826, in _python_agg_general
return self._python_apply_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 698, in _python_apply_general
self.axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1577, in apply
splitter = self._get_splitter(data, axis=axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1563, in _get_splitter
comp_ids, _, ngroups = self.group_info
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:44222)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1670, in group_info
comp_ids, obs_group_ids = self._get_compressed_labels()
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1677, in _get_compressed_labels
all_labels = [ping.labels for ping in self.groupings]
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2308, in labels
self._musr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2319, in _make_labels
labels, uniques = algos.factorize(self.grouper, sort=self.sort)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/algorithms.py", line 313, in factorize
labels = table.get_labels(vals, uniques, 0, na_sentinel, True)
File "pandas/src/hashtable_class_helper.pxi", line 843, in pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:14831)
TypeError: unhashable type: 'memmap'