1

GOAL : Detect people in real time with minimal delay/latency from given remote video stream.

Setup :

  • Raspberry Pi (2) w/ usb web camera serving up images/video stream using Flask.
  • Local machine ( macbook pro ) obtains video stream, processes images through OpenCV, Darknet/DarkFlow/Yolo, and Tensorflow.
  • Display obtained processed stream with detected people. Detected people will have a rectangle around them.
  • Python 3

I currently have the base functionality working, BUT, it seems to be rather slow. A image is processed about every few seconds, when I need it to be processed in less than a second. So the result is a video that shows updates way behind the stream and is choppy. From searching around, this seems to be a common problem but I have not seemed to find a straight forward answer.

I have implemented the stream grab as its own thread as some forums have said, but I believe the issue now to be just the time it takes to process that grabbed image.

Is it possible to improve performance? Do I need to do this processing in the cloud on a system supplying a good GPU so I can take advantage of that performance increase? Am I using the wrong yolo weights and cfg? I know yolov3 is out, but I think I had issues getting to work with my env.

incoming_frames = queue.Queue()

class Stream(threading.Thread):
    def __init__(self, ID):
        threading.Thread.__init__(self)
        self.cam=cv2.VideoCapture('http://raspberrypi.local:5000/')

    def run(self):
        frame_id = 0
        while True:
            ret,frame=self.cam.read()
            if ret:
                frame_id = frame_id + 1
                frame_dict = {}
                frame_dict['frame'] = frame
                frame_dict['id'] = frame_id
                incoming_frames.put(frame_dict)
                print("ACQUIRED FRAME " + str(frame_id))
                time.sleep(0.1)
    def stop(self):
        self._stop_event.set()

print("[INFO] Starting Process......")

print("[INFO] Load Model / Weights")
options = {"model": "cfg/yolo.cfg", "load": "bin/yolo.weights", "threshold": 0.1}
tfnet = TFNet(options)

print("[INFO] Start Video Grab Thread")
stream = Stream(0)
stream.start()


while True:
    if(not incoming_frames.empty()):
        frame = incoming_frames.get()
        result = tfnet.return_predict(frame['frame'])
        print("Processing Frame " + str(frame['id']))
        coordinates = []
        for detection in result:
            if detection['label'] == 'person' and detection['confidence'] >= 0.4:
                cv2.rectangle(frame['frame'], (detection['topleft']['x'], detection['topleft']['y']),
                   (detection['bottomright']['x'], detection['bottomright']['y']),
                    (0, 255, 0), 2)
                body = {'x': detection['topleft']['x'], 'y': detection['topleft']['y'],
                        'width': (detection['bottomright']['x'] - detection['topleft']['x']),
                        'height': (detection['bottomright']['y'] - detection['topleft']['y'])}
                coordinates.append(body)
        cv2.rectangle(frame['frame'], frame['x1'], frame['y1'], frame['x2'], frame['y2'], (0, 255, 0), 2)
        cv2.imshow('Video', frame['frame'])
        cv2.waitKey(1)

stream.stop()
cv2.destroyAllWindows()
ndyr
  • 503
  • 4
  • 20

0 Answers0