Why is running a .compute() in dask causing "Fatal Python error: GC object already tracked"

Question

I am running Windows 10 with Jupyter notebook version 4.0.6 with Python 2.7.10 and Anaconda 2.4.0 (64-bit)

I am following a blog/tutorial at https://jakevdp.github.io/blog/2015/08/14/out-of-core-dataframes-in-python/ :

from dask import dataframe as dd
columns = ["name", "amenity", "Longitude", "Latitude"]
data = dd.read_csv("POIWorld.csv", usecols=columns)
with_name = data[data.name.notnull()]
with_amenity = data[data.amenity.notnull()]
is_starbucks = with_name.name.str.contains('[Ss]tarbucks')
is_dunkin = with_name.name.str.contains('[Dd]unkin')
starbucks = with_name[is_starbucks]
dunkin = with_name[is_dunkin]
dd.compute(starbucks.name.count(), dunkin.name.count())

This last statement causes an error to come up in my command prompt session running Jupyter as follows:

Fatal Python error: GC object already tracked

Reading similar questions it could be a possible issue in the source code for dask dealing with Python handling memory, I'm hoping I'm just missing something.

I had a previous issue with headers and dask in this tutorial and had to run:

pip install git+https://github.com/blaze/dask.git --upgrade

Similar questions that do not help:

Fatal Python error: GC object already tracked

Debugging Python Fatal Error: GC Object already Tracked

Try this with the previous version of Pandas. I believe that Pandas 0.17.1 introduced some unsafe thread features. Try `pip install pandas==0.17.0` — MRocklin, Dec 07 '15 at 15:54

score 2 · Answer 1 · answered Jun 09 '16 at 14:50

Some versions of Pandas do not handle multiple threads well, especially for pandas.read_csv. These are fixed in recent versions of Pandas so this problem can probably be resolved by one of the following:

conda install pandas

pip install pandas --upgrade

Why is running a .compute() in dask causing "Fatal Python error: GC object already tracked"

1 Answers1