0

I pushed a pandas dataframe object of 1MM rows and 35 columns into directView of ipython parallel engine. However, I am having trouble pushing this data (or even an empty dataframe) into the engine as my function fails to print the length of the dataframe. Here is a snippet of my code.

ipcluster start -n 4

def myfn():
  rc = Client()
  dview = rc[:]
  data = ..... #queried from some source of 1MM rows
  dview.push(dict(data=data,new=DataFrame()))
  async = dview.map_async(f,range(3))

  return async 

def f(n):
  test = DataFrame() 
  x = len(data) # type data = pandas.core.frame.DataFrame
  #print len(test) #works fine, gets three "0"s
  #print len(new)  # empty DF, gets an error below
  print len(data)  # 1MM row DF, gets an error below
  return x

after looking at asyn.stdout, this is the error I received. any help is appreciated!:

In [205]: x1.stdout
Out[205]:
[u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n',
 u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n',
 u'Traceback (most recent call last):\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 760, in structured_traceback\n    records = _fixed_getinnerframes(etb, context, tb_offset)\n  File "/myProj/ipython/0.13.2-py27/lib/IPython/core/ultratb.py", line 242, in _fixed_getinnerframes\n    records  = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1043, in getinnerframes\n    framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 1007, in getframeinfo\n    lines, lnum = findsource(frame)\n  File "//myProj/core/2.7.3-64/lib/python2.7/inspect.py", line 580, in findsource\n    if pat.match(lines[lnum]): break\nIndexError: list index out of range\n']
cs_newbie
  • 1,959
  • 1
  • 15
  • 16
  • Try upgrading to IPython stable (1.1), or even master to see if it is reproducible. – Matt Nov 13 '13 at 19:32
  • i am able to access 'data' now, by printing the length of DF. however, i receive the same error when i try to access it, data=data.xs('0','b')...any ideas? (company ipython...can't update) – cs_newbie Nov 13 '13 at 20:07

1 Answers1

1

There is a bug in IPython 0.13 that causes failed serialization of DataFrames. It is fixed in IPython 1.0, so the problem should be resolved by an upgrade. If you can't upgrade for some reason, then you will have to serialize DataFrames yourself, most easily by pickling before handing the object to IPython, and unpickling on the other side. Obviously, just upgrading would be preferable if possible.

minrk
  • 37,545
  • 9
  • 92
  • 87