2

I've actually asked a question about multiprocessing before, but now I'm running in to a weird shortcoming with the type of data that gets returned.

I'm using Gspread to interface with Google's Sheets API and get a "worksheet" object back.

This object, or an aspect of this object, is apparently incompatible with multiprocessing due to being "unpickle-able". Please see output:

File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value

multiprocessing.pool.MaybeEncodingError: Error sending result: '[<Worksheet 'Activation Log' id:o12345wm>]'. 
Reason: 'UnpickleableError(<ssl.SSLContext object at 0x1e4be30>,)'

The code I'm using is essentially:

from multiprocessing import pool
from oauth2client.client import SignedJwtAssertionCredentials
import gspread

sheet = 1
pool = multiprocessing.pool.Pool(1)
p = pool.apply_async(get_a_worksheet, args=(sheet,))

worksheet = p.get()

And the script fails while attempting to "get" the results. The get_a_worksheet function returns a Gspread worksheet object that allows me to manipulate the remote sheet. Being able to upload changes to the document is important here - I'm not just trying to reference data, I need to alter it as well.

Does anyone know how I can run a subprocess in a separate and monitorable thread, and get an arbitrary (or custom) object type safely out of it at the end? Does anyone know what makes the ssl.SSLContext object special and "unpickleable"?

Thanks all in advance.

Community
  • 1
  • 1
Locane
  • 2,886
  • 2
  • 24
  • 35
  • 1
    your example is not complete enough for someone to try, however it shows the critical issue. `SSLContext` is unpicklable. There is a fork of multiprocessing (called `multiprocess`) that can pickle almost all python objects, however it also fails on a `SSLContext` object. You might be able to modify your `get_a_worksheet` function however to work. The first approach is to put it into another file and then import it into the file where you use the `pool`. Wherever you can take advantage of pickling by reference, you should do it. – Mike McKerns Jul 31 '15 at 00:38
  • Thanks for the comment Mike. Do you mean writing the contents to a file in the subprocess to be referenced later? If that's what you mean, that won't work since it's important that I be able to also upload changes to the remote document with this object. – Locane Jul 31 '15 at 00:45
  • No, I mean defining the function `get_a_worksheet` in it's own file, then importing it into a second file you will use to run the `pool`. It's easier to pickle if they are in separate files. I can't tell from your code if you are doing that already. if you aren't it *might* get you past the pickling issue – Mike McKerns Jul 31 '15 at 12:06

2 Answers2

0

Multiprocessing uses pickling to pass objects between processes. So I do not believe you can use multiprocessing and make an object unpicklable.

Michael
  • 13,244
  • 23
  • 67
  • 115
0

I ended up writing a solution around this shortcoming by having the sub-process simply perform the necessary work inside itself rather than return a Worksheet object.

What I ended up with was about half a dozen function and multiprocessing function pairs, each one written to do what I needed done, but inside of a sub-process so that it could be monitored and timed.

A hierarchical map would look something like:

Main()
    check_spreadsheet_for_a_string()
        check_spreadsheet_for_a_string_worker()
    get_hash_of_spreadsheet()
        get_hash_of_spreadsheet_worker()

... etc

Where the "worker" functions are the functions called in the multiprocessing setup, and the regular functions above them manage the sub-process and time it to make sure the overall program doesn't halt if the call to gspread internals hangs or takes too long.

Locane
  • 2,886
  • 2
  • 24
  • 35