0

I am trying to multiprocess the Twitter Search and Stream API but I am unable to start any process. I tried with simpler functions and the start() method works.

I would really appreciate if you can tell me how I can resolve these issues!

Thank you!

Sincerely, Terence

Here is a part of my code to start the process:

    queue = Queue()
    exitEvent = Event()
    multiprocessing.log_to_stderr(logging.DEBUG)

    print "-- starting tweetParser! --"
    tweetParser = Process(target = parseTweets, args = (conn, cursor, queue, exitEvent, config['debug'],))
    tweetParser.daemon = True
    tweetParser.start()

    print "-- starting twitterStream! --"
    twitterStream = Process(target = StreamingTwitter, args = (streamAPI, queue, exitEvent, config['keywords'], config['time'], config['debug'],))
    twitterStream.daemon = True
    twitterStream.start()

    print "-- starting twitterSearch! --"
    twitterSearch = Process(target = SearchingTwitter, args = (searchAPI, queue, exitEvent, config['keywords'], config['time'], config['debug'],))
    twitterSearch.start()

    tweetParser.join()

Here is the full traceback of my error:

Traceback (most recent call last):
  File "C:\Python27\Code\tweets\Stream.py", line 620, in <module>
    tweetParser.start()
  File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Python27\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 568, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 686, in _batch_setitems
    save(k)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type 'NoneType'>: it's not found as __builtin__.NoneType
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 381, in main
    self = load(from_parent)
  File "C:\Python27\lib\pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 886, in load_eof
    raise EOFError
EOFError
[INFO/Process-1] process shutting down
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] running the remaining "atexit" finalizers
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139

1 Answers1

0

You are having serialization problems. You can't pickle a None with multiprocessing. However you can with multiprocess. It's a fork that uses better serialization.

>>> import multiprocess
>>> multiprocess.Pool().map(lambda x: x, [None, None, None])
[None, None, None]
>>> 

That might be all you need.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Hello Mike, Thank you a lot for your reply! However, I cannot find the package multiprocess in python 2.7. Is there any way to solve the pickle issue with the multiprocessing module? – user3863316 Aug 16 '16 at 13:25
  • @user3863316: `multiprocess` is a 3rd party module. you'd need to install it with `pip install multiprocess`. Note, it is solvable within standard `multiprocessing`, you will just have to make sure every object you use is pickleable with `cPickle`. Doing so involves some meticulous code restructuring and object registration (to the pickle registry). Unless you can't install a 3rd party package, then I'd try the `multiprocess` option first. I'm the package author, so I'm somewhat biased, but nonetheless, it should be the far easier route. – Mike McKerns Aug 16 '16 at 16:03
  • Hello Mike! I downloaded ur package through pip, but there is an error: Traceback (most recent call last): File "C:\Python27\Code\test.py", line 1, in import multiprocess File "C:\Python27\lib\site-packages\multiprocess\__init__.py", line 84, in import _multiprocess as _multiprocessing ImportError: No module named _multiprocess – user3863316 Aug 16 '16 at 18:36
  • Let me guess... you are on Windows. You need a C compiler. See: https://wiki.python.org/moin/WindowsCompilers. – Mike McKerns Aug 16 '16 at 20:26