How to run a TensorFlow in parallel with some other processing on Raspberry Pi3

Question

I would like to run a continuous stream with the PiCamera on the Raspberry Pi 3 and also do other computations in parallel with this stream.

I only have to take from that stream(process) the object that it detected. I will post here the code I have so far. It doesn't enter in the computation() function. It just starts the camera and detects the objects there and stays in that process.

I've tried using multiprocessing module from Python but it doesn't seem to work.

def startRecord():
    frame_rate_calc = 1
    freq = cv2.getTickFrequency()
    font = cv2.FONT_HERSHEY_SIMPLEX
    camera = PiCamera()
    camera.resolution = (IM_WIDTH, IM_HEIGHT)
    camera.framerate = 10
    camera.vflip = True
    rawCapture = PiRGBArray(camera, size=(IM_WIDTH, IM_HEIGHT))
    rawCapture.truncate(0)

    for frame1 in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
        object_detected = "none"
        t1 = cv2.getTickCount()

        # Acquire frame and expand frame dimensions to have shape: [1, None, None, 3]
        # i.e. a single-column array, where each item in the column has the pixel RGB value
        frame = np.copy(frame1.array)
        frame.setflags(write=1)
        frame_expanded = np.expand_dims(frame, axis=0)

        # Perform the actual detection by running the model with the image as input
        (boxes, scores, classes, num) = sess.run(
            [detection_boxes, detection_scores, detection_classes, num_detections],
            feed_dict={image_tensor: frame_expanded})

        # Draw the results of the detection (aka 'visualize the results')
        vis_util.visualize_boxes_and_labels_on_image_array(
            frame,
            np.squeeze(boxes),
            np.squeeze(classes).astype(np.int32),
            np.squeeze(scores),
            category_index,
            use_normalized_coordinates=True,
            line_thickness=8,
            min_score_thresh=0.40)

        if classes[0][0] == 1 and scores[0][0] > 0.98:
            object_detected = "circle"
        elif classes[0][0] == 2 and scores[0][0] > 0.98:
            object_detected = "donnut"
        elif classes[0][0] == 3 and scores[0][0] > 0.98:
            object_detected = "square"
        elif classes[0][0] == 4 and scores[0][0] > 0.98:
            object_detected = "alphabot"

        cv2.putText(frame, "FPS: {0:.2f}".format(frame_rate_calc), (30, 50), font, 1, (255, 255, 0), 2, cv2.LINE_AA)

        # All the results have been drawn on the frame, so it's time to display it.
        cv2.imshow('Object detector', frame)

        t2 = cv2.getTickCount()
        time1 = (t2 - t1) / freq
        frame_rate_calc = 1 / time1

        # Press 'q' to quit
        if cv2.waitKey(1) == ord('q'):
            break

        rawCapture.truncate(0)
    camera.close()


def computation():
    print("OUTSIDE OF CAPTURE")
    print(object_detected)
### Picamera ###
if camera_type == 'picamera':
    # Initialize Picamera and grab reference to the raw capture
    p1 = Process(target=startRecord())
    p2 = Process(target=computation())
    p1.start()
    p2.start()


    p1.join()
    p2.join()

score 0 · Answer 1 · edited Oct 02 '21 at 20:31

Your intent is clear, yet your posted code will need some polishing and RPi3 will be the hardest part of the journey:

The multiprocessing module is indeed capable of spawning a pool of sub-processes, yet at an immense cost - both in [PSPACE] it populates a FULL-COPY of the python interpreter session ( statefully and that grabs immense amounts of RAM for each replicated copy ) and in [PTIME], as it takes the more time the more sub-processes one spawns ( a one-time cost, yet very important for overhead-strict Amdahl's Law re-formulated maximum parallel speedup-benefit evaluations )
There is not more than 1 MB RAM on RPi3 IIRC, so either your TF-model has to be indeed miniature, or you have to carefully spawn the TF-dedicated sub-process in a way it does not replicate the same TF-BLOBs in the main-session ( and not leaving the same BLOB there ) - quite a tricky part.
Still, you need a coordination between both ends of your intended distributed processing. One process cannot put its hand into a pocket of another process and take any value outside of the neighbour's explicit will ( and control ). At the moment: -- one side, invoked as the p1 = Process( target = startRecord() ) never calls a computation() to be ever done -- the other side, invoked as the p2 = Process( target = computation() ) never ever receives ( and by its design it even does not require to be done so ) a single piece of data from anywhere else ( the only reference to some variable/object - the object_detected - was already pre-copied from the main session, in such a state ( if there was any such present at all ) it had during the sub-process instantiation, and has become a fully-separated replica ( intentionally un-connected and un-coordinated w.r.t. any, now "external", changes - as this very set of features liberates the sub-process of any central main-session's GIL-lock coordination overheads and the sub-processes may work independently, enjoying increased levels of [CONCURRENT] process-execution(s) )

Solution:

a) verify, if sizing meets the realms of RPi3 RAM constraints

b) right-size the main-session before spawning any specialised sub-processes

c) design inter-process communications ( default Queue/deQueue may become un-fit for fast and efficient processing - may enjoy a smarter and more controllable signalling/messaging framework alike the ZeroMQ ipc:// for moving raw byte-blocks in an almost Zero-Copy manner - yes, here again and again, RPi3 RAM-ceiling will hurt you both [PSPACE]-wise ( so as to fit inside on both en-Queue / de-Queue sides overhead ) and also [PTIME]-wise ( as the add-on latencies will grow, might be beyond one's process-control acceptable thresholds ) )

How to run a TensorFlow in parallel with some other processing on Raspberry Pi3

1 Answers1

Solution: