1

I am running Windows 10 with Jupyter notebook version 4.0.6 with Python 2.7.10 and Anaconda 2.4.0 (64-bit)

I am following a blog/tutorial at https://jakevdp.github.io/blog/2015/08/14/out-of-core-dataframes-in-python/ :

from dask import dataframe as dd
columns = ["name", "amenity", "Longitude", "Latitude"]
data = dd.read_csv("POIWorld.csv", usecols=columns)
with_name = data[data.name.notnull()]
with_amenity = data[data.amenity.notnull()]
is_starbucks = with_name.name.str.contains('[Ss]tarbucks')
is_dunkin = with_name.name.str.contains('[Dd]unkin')
starbucks = with_name[is_starbucks]
dunkin = with_name[is_dunkin]
dd.compute(starbucks.name.count(), dunkin.name.count())

This last statement causes an error to come up in my command prompt session running Jupyter as follows:

Fatal Python error: GC object already tracked

Reading similar questions it could be a possible issue in the source code for dask dealing with Python handling memory, I'm hoping I'm just missing something.

I had a previous issue with headers and dask in this tutorial and had to run:

pip install git+https://github.com/blaze/dask.git --upgrade

Similar questions that do not help:

Fatal Python error: GC object already tracked

Debugging Python Fatal Error: GC Object already Tracked

Community
  • 1
  • 1
bronstad
  • 67
  • 1
  • 1
  • 7
  • Try this with the previous version of Pandas. I believe that Pandas 0.17.1 introduced some unsafe thread features. Try `pip install pandas==0.17.0` – MRocklin Dec 07 '15 at 15:54

1 Answers1

2

Some versions of Pandas do not handle multiple threads well, especially for pandas.read_csv. These are fixed in recent versions of Pandas so this problem can probably be resolved by one of the following:

conda install pandas

pip install pandas --upgrade
MRocklin
  • 55,641
  • 23
  • 163
  • 235