I am using python 2.7 with dask dataframe
I have a df that is too big for memory but fits into disk beautifully.
I group by an index, and than need to iterate over the groups, I found here how to do it.
When I try to use the suggested code:
for value in drx["col"].unique():
print value
I get an error
File "/usr/local/lib/python2.7/dist-packages/dask/dataframe/core.py", line 1709, in getitem raise NotImplementedError() NotImplementedError
Assuming that it's not implemented, I found the way to iterate the series I get using unique() is this
But when I try to utilize it like so:
data = table["col"].unique()
it = data.iteritems()
for val in it:
print 1
My memory explodes as if all the values of the columns are saved in memory for as long as I use the iterator.
How can I use the iterator values without saving all of them into memory?