I am running Windows 10 with Jupyter notebook version 4.0.6 with Python 2.7.10 and Anaconda 2.4.0 (64-bit)
I am following a blog/tutorial at https://jakevdp.github.io/blog/2015/08/14/out-of-core-dataframes-in-python/ :
from dask import dataframe as dd
columns = ["name", "amenity", "Longitude", "Latitude"]
data = dd.read_csv("POIWorld.csv", usecols=columns)
with_name = data[data.name.notnull()]
with_amenity = data[data.amenity.notnull()]
is_starbucks = with_name.name.str.contains('[Ss]tarbucks')
is_dunkin = with_name.name.str.contains('[Dd]unkin')
starbucks = with_name[is_starbucks]
dunkin = with_name[is_dunkin]
dd.compute(starbucks.name.count(), dunkin.name.count())
This last statement causes an error to come up in my command prompt session running Jupyter as follows:
Fatal Python error: GC object already tracked
Reading similar questions it could be a possible issue in the source code for dask dealing with Python handling memory, I'm hoping I'm just missing something.
I had a previous issue with headers and dask in this tutorial and had to run:
pip install git+https://github.com/blaze/dask.git --upgrade
Similar questions that do not help: