0

I have a image-processing service containing two methods, which I want to execute in parallel using the multiprocessing library in Python.

The first-method does an api call in order to fetch image metadata from an external service.

The second method uses an object of a class which performs certain complex operations such as reading an image using the opencv library and also performing an image classification activity using a sklearn model

The first function looks like this (as shown below) -->

def function_1():
  ##perform long running api call

and this is my second function -->

def function_2(image_proc_obj):
  predictions = image_proc_obj.predict()

On calling these two methods using multiprocessing.Process as shown below

image_proc_obj = ImageProcessingClass()
p1 = multiprocessing.Process(target=function_1)
p2 = multiprocessing.Process(target=function_2, args=(image_proc_obj,))

I am getting a ValueError: ctypes objects containing pointers cannot be pickled

I am passing the image_proc_obj in the second function because the constructor call of this class loads the model file which I don't want to happen on every function call.

I also tried creating a class in this manner by subclassing multiprocess.Process

class ImageClassifier(multiprocess.Process):
   def __init__(self, process_obj):
      super(ImageClassifier,self).__init__()
       self.proc_obj = process_obj


   def run(self, image):
      predictions = self.proc_obj.predict(image)

But on running the commands as shown below:

image_proc_obj = ImageProcessingClass()
classifier = ImageClassifier(name="classifier process", process_obj=image_proc_obj)
classifier.start()
classifier.join()

I get the same error --> ValueError: ctypes objects containing pointers cannot be pickled

Looking forward to some help with this

user3666197
  • 1
  • 6
  • 50
  • 92
  • As the error indicates, the issue is that *the object* cannot be pickled ("sent to the subprocess"). We won't be able to help you if we don't know what that object is. – MisterMiyagi Jan 13 '22 at 09:00
  • Hello @MisterMiyagi The object is instantiated using a class called ImageProcessingClass. This class loads an sklearn based rfc model in it's _init_ call and also has a method called predict which I am using in function_2 as mentioned above – Abhishek Bose Jan 13 '22 at 09:06
  • A few observations: In general, do not have `ImageProcessing` class (you do not need to append "Class" to the name) subclass `Process` (*if* you need to create a new `Process`, then do it with the *target* argument for greater flexibility). It also seems to me that you do not need multiprocessing at all. Fetching the metadata is mostly network waiting and can be done with a `Thread` and the main process of the program can do the heavy CPU processing concurrently. That is, there is no need to start a new process. – Booboo Jan 13 '22 at 11:28

1 Answers1

-2

I have designed and I am still successfully using the same process-to-process, low-latency optimised communication, using the below described strategy, having a minimum latency & shortest possible TAT in mind, as doing remote sklearn-.predict()-s, and it works for production grade "remote"-predictions on sub-[ms]-sampled p2p-requests - this has worked for about six years like a charm.

Q : " How to pass an object to a process ... ? "

A :
Easy,
due to a wish to pass an object-instance into another, independent process, there is a need to prepare a so called serialised-representation of the original-process object.

Having created a SER-ialised / transferred / DES-erialised path, the original object data may get into the "remote"-process hands.

The transfer-tool may be a Python-native Queue ( which uses pickle-tool for SER/DES ) or any other tool ( like nanomsg, ZeroMQ ( pyzmq ), pynng, raw-socktes' tools ), yet here one has to perform the SER/DES-transformations "programatically".

Where pickle.dump() was common to fail, there might be a chance to use a Mike McKearns' dill-pakage, just start to import dill as pickle and may still let pickle.dump() in source code


Nota bene:
once OpenCV payloads are under considerations, be warned, that Python-interpreter just points (refers) to an OpenCV native mat::-memory, so better use a numpy-"flattened" re-representation of the image-data, if trying to move that to the other, "remote", process.

halfer
  • 19,824
  • 17
  • 99
  • 186
user3666197
  • 1
  • 6
  • 50
  • 92