2

I am trying to use a simple apply on s frame full of data. This is for a simple data transform on one of the columns applying a function that takes a text input and splits it into a list. Here is the function and its call/output:

    In [1]: def count_words(txt):
           count = Counter()
           for word in txt.split():
               count[word]+=1
           return count

    In [2]: products.apply(lambda x: count_words(x['review']))

    ---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-85338326302c> in <module>()
----> 1 products.apply(lambda x: count_words(x['review']))

C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\data_structures\sframe.pyc in apply(self, fn, dtype, seed)
   2607 
   2608         with cython_context():
-> 2609             return SArray(_proxy=self.__proxy__.transform(fn, dtype, seed))
   2610 
   2611     def flat_map(self, column_names, fn, column_types='auto', seed=None):

C:\Anaconda3\envs\dato-env\lib\site-packages\graphlab\cython\context.pyc in __exit__(self, exc_type, exc_value, traceback)
     47             if not self.show_cython_trace:
     48                 # To hide cython trace, we re-raise from here
---> 49                 raise exc_type(exc_value)
     50             else:
     51                 # To show the full trace, we do nothing and let exception propagate

RuntimeError: Runtime Exception. Unable to evaluate lambdas. Lambda workers did not start.

When I run my code I get that error. The s frame (df) is only 10 by 2 so there should be no overload coming from there. I don't know how to fix this issue.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
rgalbo
  • 4,186
  • 1
  • 19
  • 29
  • Don't you want `products['review'].apply(count_words)`? – EdChum Dec 10 '15 at 11:48
  • I got the same runtime error with Graphlab 2.1. I then reran my code block and it executed with no error. Not sure why, but it could be worth a try for anyone else experiencing the same error. – garbo999 Jul 29 '16 at 19:44

2 Answers2

1

If you're using GraphLab Create, there is actually a built-in tool for doing this, in the "text analytics" toolkit. Let's say I have data like:

import graphlab
products = graphlab.SFrame({'review': ['a portrait of the artist as a young man',
                                       'the sound and the fury']})

The easiest way to count the words in each entry is

products['counts'] = graphlab.text_analytics.count_words(products['review'])

If you're using the sframe package by itself, or if you want to do a custom function like the one you described, I think the key missing piece in your code is that the Counter needs to be converted into a dictionary in order for the SFrame to handle the output.

from collections import Counter

def count_words(txt):
    count = Counter()
    for word in txt.split():
        count[word] += 1
    return dict(count)

products['counts'] = products.apply(lambda x: count_words(x['review']))
papayawarrior
  • 1,027
  • 7
  • 10
  • I still get a runtime error when I am returning `dict(count)` and thank you I was aware of the`graphlab.text_analytics` but wanted to use my own solution which isn't working out too well. – rgalbo Dec 12 '15 at 09:52
1

For anyone who has come across this issue while using graphlab here is the the discussion thread on the issue on dato support:

http://forum.dato.com/discussion/1499/graphlab-create-using-anaconda-ipython-notebook-lambda-workers-did-not-start

Here is the code that can be run to provide a case by case basis for this issue.

After starting ipython or ipython notebook in the Dato/Graphlab environment, but before importing graphlab, copy and run the following code

import ctypes, inspect, os, graphlab
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
kernel32.SetDllDirectoryW.argtypes = (wintypes.LPCWSTR,)
src_dir = os.path.split(inspect.getfile(graphlab))[0]
kernel32.SetDllDirectoryW(src_dir)

# Should work
graphlab.SArray(range(1000)).apply(lambda x: x)

If this is run, the the apply function should work fine with sframe.

rgalbo
  • 4,186
  • 1
  • 19
  • 29