How to implement Pipelining in Python?

Question

I have a program that processes a live video of some markers.

It is divided into:

Import next image of video
Convert Image to readable form
Detection of Markers
Tracking of Markers
Draw UI

This is working pretty well on my pc but it needs to work on a Raspberry Pi as well, so using just one core the whole time wont cut it.

That's why I want to introduce pipelining. In my computer architecture course in the university I learned about hardware pipelining so I was wondering if it would be possible to implement something like that in python:

So instead of doing Import -> Conversion -> Processing -> Tracking -> Draw -> ...

I want to do it like this:

-1----2----3----4-----5----...
Imp--Imp--Imp--Imp---Imp---...
-----Conv-Conv-Conv--Conv--...
----------Pro--Pro---Pro---...
---------------Track-Track-...
---------------------Draw--...

So that every "clock cycle" an image is ready and not only every 5th.

So I was thinking about using the Multiprocessing library of python for this but I have no experience with it but some simple test programs so im not sure what would best suit this use case i.e Queue, Pool, Manager,...

SOLVED:

This can be done with mpipe a cool pipelining tool kit for python. [http://vmlaker.github.io/mpipe/][1]

while True:
    stage1 = mpipe.OrderedStage(conversion, 3)
    stage2 = mpipe.OrderedStage(processing, 3)
    stage3 = mpipe.OrderedStage(tracking, 3)
    stage4 = mpipe.OrderedStage(draw_squares, 3)
    stage5 = mpipe.OrderedStage(ui, 3)

    pipe = mpipe.Pipeline(stage1.link(stage2.link(stage3.link(stage4.link(stage5)))))

    images = []
    while len(images) < 3:
        ret = False
        while not ret:
            ret, image = cap.read()
        images.append(image)

    for i in images:
        t = (i, frame_counter, multi_tracker)
        pipe.put(t)

    pipe.put(None)

    for result in pipe.results():
        image, multi_tracker, frame_counter = result
        Show.show_win("video", image)

As @r_e suggested I read multiple images at the start and fill a pipeline with it. Now in every step of the calculations multiple worker-processes are started so that everyone can work on a separate image.

As some additional information needs to be passed aside from just the image I just return image and the additional information and unpack it in the next stage again.

At the moment I had to disable the tracking so im not able to compare it to the old version. Atm it is a bit slower (tracking would imporve speed as I would not need to detected objects in every frame but only every 30th). But ill give you an update if I get it to work.

What size are the frames (width and height)? Are they colour or grayscale? — Mark Setchell, Jun 25 '19 at 17:15
Related talk by Raymond Hettinger on Concurrency: https://www.youtube.com/watch?v=9zinZmE3Ogk — moooeeeep, Jun 25 '19 at 17:15
I think the best you can do is to use multithreading to capture frames (Step #1) in one thread while processing (Step #2 - #4) since you're bounded by the Python's GIL. For true pipelining you would have to use multiprocessing with probably a Queue to pass around the frame but the overhead may not be worth it — nathancy, Jun 25 '19 at 21:02
@nathancy I also think the overhead of passing data via a queue is possibly too great which is why I need OP to answer my question about image size and colour.... — Mark Setchell, Jun 25 '19 at 21:20
For the prototype I am using a camera with a resolution of 640*480 and use cv2.pyrDown() once on it for processing. For the UI I either need the original image or have to cv2.pyrUp() the processed image. — le_lemon, Jun 26 '19 at 05:49

score 2 · Answer 1 · answered Jun 25 '19 at 17:10

2

Since I do not have 50 reputations, I could not comment it. I am not experienced with it as well but little bit search led me to the following website where it talks about real-time and video processing using Multiprocessing library. Hope it will help.

1) Read frames; put them inside of the input queue with corresponding frame numbers for each:

  # Check input queue is not full
  if not input_q.full():
     # Read frame and store in input queue
     ret, frame = vs.read()
      if ret:            
        input_q.put((int(vs.get(cv2.CAP_PROP_POS_FRAMES)),frame))

2) Take the frames from input queue and put to output with their corresponding frame numbers:

while True:
  frame = input_q.get()
frame_rgb = cv2.cvtColor(frame[1], cv2.COLOR_BGR2RGB)
  output_q.put((frame[0], detect_objects(frame_rgb, sess, detection_graph)))

3) Recover treated frame in output queue and feed priority queue if the output queue is not empty

# Check output queue is not empty
if not output_q.empty():
  # Recover treated frame in output queue and feed priority queue
  output_pq.put(output_q.get())

4) Draw your frames until output queue is empty

# Check output priority queue is not empty
  if not output_pq.empty():
    prior, output_frame = output_pq.get()
    if prior > countWriteFrame:
      output_pq.put((prior, output_frame))
    else: 
      countWriteFrame = countWriteFrame + 1    
      # Draw something with your frame

5) Finally, to stop, check if the input queue is empty. If yes, break.

if((not ret) & input_q.empty() & 
    output_q.empty() & output_pq.empty()):
  break

Link can be found HERE

answered Jun 25 '19 at 17:10

r_e

242
4
14

Thanks a lot for your comment im gonna look into it at 15.00 CEST. – le_lemon Jun 26 '19 at 05:54
So I read your comment and I dont get how it should improve the performance. Because it only buffers the input and ouput images in a queue and does no multiprocessing as far as I can see. But maybe I could use this to fill up the InputQueue spawn multiple Processes that do stuff on all the images in the queue simultaneously and put it in the Output Queue. (I also read the article of the link) – le_lemon Jun 26 '19 at 13:34
I am not pretty sure. I tried to search around and came across with it which I thought would help with processing the images. Besides that link, maybe these two would be helpful: https://gist.github.com/smhanov/8fb48199338045fc5e69fd615211c84c https://docs.python.org/2/library/multiprocessing.html – r_e Jun 26 '19 at 13:48
Thanks for the link it brought me to the tool kit mpipe which seems quite useful for this use case. Im working on it atm. – le_lemon Jun 26 '19 at 15:17
For sure. You can check more about mpipe here: http://vmlaker.github.io/mpipe/ After coding, please do not forget to post the final answer here and upvote the useful answers. Good luck! – r_e Jun 26 '19 at 15:23
I cant upvote because I have only 11 reputation :/. – le_lemon Jun 26 '19 at 16:17
1

I got mpipe to work but had problems with the tracking (which sped up the program as it didnt need to detect objects in every frame) so i cant really compare it to the normal program right now. As of now it is a bit slower so if I get tracking to work it may be faster Ill post the code anyway. – le_lemon Jun 26 '19 at 16:19
Great. Good job! Make sure to give the correct URL. The current one is not working – r_e Jun 26 '19 at 18:10
Hi @le_lemon, instead of editing the question with an answer, please go all the way down to the page, and click on "Answer your own question". Paste your answer there and modify the question so it only shows the question and edits, not the answer. After 3 days, accept your answer as an "Answer" to the question. – r_e Jul 12 '19 at 13:46

Mark Setchell · Answer 2 · 2019-06-26T16:41:36.380

I had a little attempt at this. It is heavily based on your diagram and uses a 5-stage pipeline and multi-processing. Start reading near the end at:

def main():
    ...
    ...

#!/usr/bin/env python3

import logging
import numpy as np
from time import sleep
from multiprocessing import Process, Queue

class Stage1(Process):
    """Acquire frames as fast as possible and send to next stage"""
    def __init__(self, oqueue):
        super().__init__()
        # Pick up parameters and store in class variables
        self.oqueue = oqueue      # output queue

    def run(self,):
        # Turn on logging
        logging.basicConfig(level=logging.DEBUG,
                        format='%(created).6f [%(levelname)s] Stage1 %(message)s',
                        filename='log-stage1.txt', filemode='w')
        logging.info('started')

        # Generate frames and send down pipeline
        for f in range(NFRAMES):
            logging.debug('Generating frame %d',f)
            # Generate frame of random stuff
            frame = np.random.randint(0,256,(480,640,3), dtype=np.uint8)
            logging.debug('Forwarding frame %d',f)
            self.oqueue.put(frame)

class Stage2(Process):
    """Read frames from previous stage as fast as possible, process and send to next stage"""
    def __init__(self, iqueue, oqueue):
        super().__init__()
        # Pick up parameters and store in class variables
        self.iqueue = iqueue      # input queue
        self.oqueue = oqueue      # output queue

    def run(self,):
        # Turn on logging
        logging.basicConfig(level=logging.DEBUG,
                        format='%(created).6f [%(levelname)s] Stage2 %(message)s',
                        filename='log-stage2.txt', filemode='w')
        logging.info('started')

        for f in range(NFRAMES):
            # Wait for next frame
            frame = self.iqueue.get()
            logging.debug('Received frame %d', f)
            # Process frame ...

            logging.debug('Forwarding frame %d', f)
            self.oqueue.put(frame)

class Stage3(Process):
    """Read frames from previous stage as fast as possible, process and send to next stage"""
    def __init__(self, iqueue, oqueue):
        super().__init__()
        # Pick up parameters and store in class variables
        self.iqueue = iqueue      # input queue
        self.oqueue = oqueue      # output queue

    def run(self,):
        # Turn on logging
        logging.basicConfig(level=logging.DEBUG,
                        format='%(created).6f [%(levelname)s] Stage3 %(message)s',
                        filename='log-stage3.txt', filemode='w')
        logging.info('started')
        for f in range(NFRAMES):
            # Wait for next frame
            frame = self.iqueue.get()
            logging.debug('Received frame %d', f)
            # Process frame ...

            logging.debug('Forwarding frame %d', f)
            self.oqueue.put(frame)

class Stage4(Process):
    """Read frames from previous stage as fast as possible, process and send to next stage"""
    def __init__(self, iqueue, oqueue):
        super().__init__()
        # Pick up parameters and store in class variables
        self.iqueue = iqueue      # input queue
        self.oqueue = oqueue      # output queue

    def run(self,):
        # Turn on logging
        logging.basicConfig(level=logging.DEBUG,
                        format='%(created).6f [%(levelname)s] Stage4 %(message)s',
                        filename='log-stage4.txt', filemode='w')
        logging.info('started')

        for f in range(NFRAMES):
            # Wait for next frame
            frame = self.iqueue.get()
            logging.debug('Received frame %d', f)
            # Process frame ...

            logging.debug('Forwarding frame %d', f)
            self.oqueue.put(frame)

class Stage5(Process):
    """Read frames from previous stage as fast as possible, and display"""
    def __init__(self, iqueue):
        super().__init__()
        # Pick up parameters and store in class variables
        self.iqueue = iqueue      # input queue

    def run(self,):
        # Turn on logging
        logging.basicConfig(level=logging.DEBUG,
                        format='%(created).6f [%(levelname)s] Stage5 %(message)s',
                        filename='log-stage5.txt', filemode='w')
        logging.info('started')

        for f in range(NFRAMES):
            # Wait for next frame
            frame = self.iqueue.get()
            logging.debug('Displaying frame %d', f)
            # Display frame ...

def main():
    # Create Queues to send data between pipeline stages
    q1_2 = Queue(5)    # queue between stages 1 and 2
    q2_3 = Queue(5)    # queue between stages 2 and 3
    q3_4 = Queue(5)    # queue between stages 3 and 4
    q4_5 = Queue(5)    # queue between stages 4 and 5

    # Create Processes for stages of pipeline
    stages = []
    stages.append(Stage1(q1_2))
    stages.append(Stage2(q1_2,q2_3))
    stages.append(Stage3(q2_3,q3_4))
    stages.append(Stage4(q3_4,q4_5))
    stages.append(Stage5(q4_5))

    # Start the stages
    for stage in stages:
        stage.start()

    # Wait for stages to finish
    for stage in stages:
        stage.join()

if __name__ == "__main__":
    NFRAMES = 1000
    main()

At the moment it just generates a frame of random noise and passes it down the pipeline. It logs each process to a separate file that it overwrites for each new run of the program because of filemode='w'. You can see the individual logs like this:

-rw-r--r--  1 mark  staff  1097820 26 Jun 17:07 log-stage1.txt
-rw-r--r--  1 mark  staff  1077820 26 Jun 17:07 log-stage2.txt
-rw-r--r--  1 mark  staff  1077820 26 Jun 17:07 log-stage3.txt
-rw-r--r--  1 mark  staff  1077820 26 Jun 17:07 log-stage4.txt
-rw-r--r--  1 mark  staff   548930 26 Jun 17:07 log-stage5.txt

You can then see the times each process received and sent each frame:

more log-stage1.txt

1561565618.603456 [INFO] Stage1 started
1561565618.604812 [DEBUG] Stage1 Generating frame 0
1561565618.623938 [DEBUG] Stage1 Forwarding frame 0
1561565618.625659 [DEBUG] Stage1 Generating frame 1
1561565618.647139 [DEBUG] Stage1 Forwarding frame 1
1561565618.648173 [DEBUG] Stage1 Generating frame 2
1561565618.687316 [DEBUG] Stage1 Forwarding frame 2

Or track say "frame 1" through the stages:

pi@pi3:~ $ grep "frame 1$" log*

log-stage1.txt:1561565618.625659 [DEBUG] Stage1 Generating frame 1
log-stage1.txt:1561565618.647139 [DEBUG] Stage1 Forwarding frame 1
log-stage2.txt:1561565618.671272 [DEBUG] Stage2 Received frame 1
log-stage2.txt:1561565618.672272 [DEBUG] Stage2 Forwarding frame 1
log-stage3.txt:1561565618.713618 [DEBUG] Stage3 Received frame 1
log-stage3.txt:1561565618.715468 [DEBUG] Stage3 Forwarding frame 1
log-stage4.txt:1561565618.746488 [DEBUG] Stage4 Received frame 1
log-stage4.txt:1561565618.747617 [DEBUG] Stage4 Forwarding frame 1
log-stage5.txt:1561565618.790802 [DEBUG] Stage5 Displaying frame 1

Or combine all the logs together in time order:

sort -g log*

1561565618.603456 [INFO] Stage1 started
1561565618.604812 [DEBUG] Stage1 Generating frame 0
1561565618.607765 [INFO] Stage2 started
1561565618.612311 [INFO] Stage3 started
1561565618.618425 [INFO] Stage4 started
1561565618.618785 [INFO] Stage5 started
1561565618.623938 [DEBUG] Stage1 Forwarding frame 0
1561565618.625659 [DEBUG] Stage1 Generating frame 1
1561565618.640585 [DEBUG] Stage2 Received frame 0
1561565618.642438 [DEBUG] Stage2 Forwarding frame 0

Neat example! Suppose, I want the oqueue of Stage1 to have a max length of 3. How do I stall Stage1 as soon as it has 3 images waiting to be fetched by Stage2? (After an image is fetched by Stage2, Stage1 will read the next image so that it always has images for Stage2 but not too many images to fill up the memory.) — Edifice, Feb 14 '23 at 16:07

score 1 · Accepted Answer · answered Aug 26 '19 at 11:18

So with the help of r_e I found a neat toolkit called mpipe which can be used for pipelining with python.

While testing I found out that importing and displaying images is a lot faster than conversion, processing and drawing the UI so I am only using a 3 stage pipeline.

It is pretty easy to use:

def conversion(input_data):
    original, frame_counter = input_data
    ...
    return con.gbg(con.down_sample(con.copy_img(original))), frame_counter

def processing(input_data):
    image, frame_counter = input_data
    ...
    return markers, frame_counter


def ui(input_data):
    markers, frame_counter = input_data
    ...
    return image, frame_counter, triangle


def main():
    ...
    while True:
        stage1 = mpipe.OrderedStage(conversion, 3)

        stage2 = mpipe.OrderedStage(processing, 3)

        stage3 = mpipe.OrderedStage(ui, 3)

        pipe = mpipe.Pipeline(stage1.link(stage2.link(stage3)))

        images = []
        while len(images) < 3:
            ret = False
            while not ret:
                ret, image = cap.read()
            images.append(image)

        for i in images:
            t = (i, frame_counter)
            pipe.put(t)

        pipe.put(None)

        for result in pipe.results():
            image, frame_counter, triangle = result
            if not triangle:
                if t_count > 6:
                    Show.show_win("video", image)
                    t_count = 0
                else:
                    t_count += 1
            else:
                Show.show_win("video", image)

How to implement Pipelining in Python?

SOLVED:

3 Answers3