What's wrong with this webcam face detection?

Question

Dlib has a really handy, fast and efficient object detection routine, and I wanted to make a cool face tracking example similar to the example here.

OpenCV, which is widely supported, has VideoCapture module that is fairly quick (a fifth of a second to snapshot compared with 1 second or more for calling up some program that wakes up the webcam and fetches a picture). I added this to the face detector Python example in Dlib.

If you directly show and process the OpenCV VideoCapture output it looks odd because apparently OpenCV stores BGR instead of RGB order. After adjusting this, it works, but slowly:

from __future__ import division
import sys

import dlib
from skimage import io


detector = dlib.get_frontal_face_detector()
win = dlib.image_window()

if len( sys.argv[1:] ) == 0:
    from cv2 import VideoCapture
    from time import time

    cam = VideoCapture(0)  #set the port of the camera as before

    while True:
        start = time()
        retval, image = cam.read() #return a True bolean and and the image if all go right

        for row in image:
            for px in row:
                #rgb expected... but the array is bgr?
                r = px[2]
                px[2] = px[0]
                px[0] = r
        #import matplotlib.pyplot as plt
        #plt.imshow(image)
        #plt.show()

        print( "readimage: " + str( time() - start ) )

        start = time()
        dets = detector(image, 1)
        print "your faces: %f" % len(dets)
        for i, d in enumerate( dets ):
            print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
                i, d.left(), d.top(), d.right(), d.bottom()))
            print("from left: {}".format( ( (d.left() + d.right()) / 2 ) / len(image[0]) ))
            print("from top: {}".format( ( (d.top() + d.bottom()) / 2 ) /len(image)) )
        print( "process: " + str( time() - start ) )

        start = time()
        win.clear_overlay()
        win.set_image(image)
        win.add_overlay(dets)

        print( "show: " + str( time() - start ) )
        #dlib.hit_enter_to_continue()



for f in sys.argv[1:]:
    print("Processing file: {}".format(f))
    img = io.imread(f)
    # The 1 in the second argument indicates that we should upsample the image
    # 1 time.  This will make everything bigger and allow us to detect more
    # faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for i, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            i, d.left(), d.top(), d.right(), d.bottom()))

    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)
    dlib.hit_enter_to_continue()


# Finally, if you really want to you can ask the detector to tell you the score
# for each detection.  The score is bigger for more confident detections.
# Also, the idx tells you which of the face sub-detectors matched.  This can be
# used to broadly identify faces in different orientations.
if (len(sys.argv[1:]) > 0):
    img = io.imread(sys.argv[1])
    dets, scores, idx = detector.run(img, 1)
    for i, d in enumerate(dets):
        print("Detection {}, score: {}, face_type:{}".format(
            d, scores[i], idx[i]))

From the output of the timings in this program, it seems processing and grabbing the picture are each taking a fifth of a second, so you would think it should show one or 2 updates per second - however, if you raise your hand it shows in the webcam view after 5 seconds or so!

Is there some sort of internal cache keeping it from grabbing the latest webcam image? Could I adjust or multi-thread the webcam input process to fix the lag? This is on an Intel i5 with 16gb RAM.

Update

According to here, it suggests the read grabs a video frame by frame. This would explain it grabbing the next frame and the next frame, until it finally caught up to all the frames that had been grabbed while it was processing. I wonder if there is an option to set the framerate or set it to drop frames and just click a picture of the face in the webcam now on read? http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_gui/py_video_display/py_video_display.html#capture-video-from-camera

The time is taken by dlib to detect the image, you could try resizing the image to smaller dimensions for better performance. — ZdaR, Aug 05 '16 at 06:49
@ZdaR Thanks for the suggestion. Please run the example, you'll see that only takes a fraction of a second. Why does it take almost 5 seconds from moving, to showing the move in the webcam window (with many intermediate frames shown before it is up to date?) That is the question. — NoBugs, Aug 05 '16 at 14:48

Victoria Stuart · Answer 1 · 2016-10-26T04:05:53.810

2

I feel your pain. I actually recently worked with that webcam script (multiple iterations; substantially edited). I got it to work really well, I think. So that you can see what I did, I created a GitHub Gist with the details (code; HTML readme file; sample output):

https://gist.github.com/victoriastuart/8092a3dd7e97ab57ede7614251bf5cbd

edited Oct 26 '16 at 04:05

answered Oct 26 '16 at 00:36

Victoria Stuart

4,610
2
44
37

1

Python3.5 too. Nice! – NoBugs Oct 26 '16 at 00:42

score 1 · Answer 2 · answered Oct 27 '17 at 09:29

1

Maybe the problem is that there is a threshold is set. As described here

dots = detector(frame, 1)

Should be changed to

dots = detector(frame)

To avoid a threshold. This is works for me, but at the same time, there is a problem that frames are processed too fast.

answered Oct 27 '17 at 09:29

kozlone

111
8

1

This is NOT a threshold. In python API it's a number of image upscales before running the detector. In that particular case '1' means 'upscale an image once'. This of course will increase processing timings (but allow to detect smaller faces). – Alexey Antonenko Dec 21 '17 at 08:43
Thanks a lot for clarifying, Alexey! – kozlone Dec 21 '17 at 12:38

score 0 · Answer 3 · answered Aug 05 '16 at 06:40

0

If you want to show a frame read in OpenCV, you can do it with the help of cv2.imshow() function without any need of changing the colors order. On the other hand, if you still want to show the picture in matplotlib, then you can't avoid using the methods like this:

b,g,r = cv2.split(img)
img = cv2.merge((b,g,r))

That's the only thing I can help you with for now=)

answered Aug 05 '16 at 06:40

Oresto

135
2
11

I think Dlib may need the array in the order it expects... or it may not detect right? Probably OK for HOGS algorithm as I've changed color of the object and it always detects it OK by the shape. – NoBugs Aug 05 '16 at 14:50
well, as soon as all these algorithms operate on matrices, there's no difference what's the order of colours. And also, the detection algorithms usually take black and white photos, which is also a plus for those who don't want to have troubles with colours. – Oresto Aug 08 '16 at 07:04
Also, OpenCV has its own detection algorithms, which, again, use black and white images for detection – Oresto Aug 08 '16 at 07:05
Unfortunately from what I had tested, OpenCV had some odd glitches where it would say a part of the wall was a face. Dlib seems more reliable, a bit faster and is easy to train different shapes into it. http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html – NoBugs Aug 09 '16 at 03:32
Comparison of OpenCV and Dlib face detection defaults: https://www.youtube.com/watch?v=LsK0hzcEyHI – NoBugs Aug 15 '16 at 05:42
This Opencv command seems to be the difference between big lag (iterating each pixel and setting it), and almost instant update. – NoBugs Aug 22 '16 at 04:02

score 0 · Accepted Answer · answered Aug 08 '16 at 06:38

I tried multithreading, and it was just as slow, then I multithreaded with just the .read() in the thread, no processing, no thread locking, and it worked quite fast - maybe 1 second or so of delay, not 3 or 5. See http://www.pyimagesearch.com/2015/12/21/increasing-webcam-fps-with-python-and-opencv/

from __future__ import division
import sys
from time import time, sleep
import threading

import dlib
from skimage import io


detector = dlib.get_frontal_face_detector()
win = dlib.image_window()

class webCamGrabber( threading.Thread ):
    def __init__( self ):
        threading.Thread.__init__( self )
        #Lock for when you can read/write self.image:
        #self.imageLock = threading.Lock()
        self.image = False

        from cv2 import VideoCapture, cv
        from time import time

        self.cam = VideoCapture(0)  #set the port of the camera as before
        #self.cam.set(cv.CV_CAP_PROP_FPS, 1)


    def run( self ):
        while True:
            start = time()
            #self.imageLock.acquire()
            retval, self.image = self.cam.read() #return a True bolean and and the image if all go right

            print( type( self.image) )
            #import matplotlib.pyplot as plt
            #plt.imshow(image)
            #plt.show()

            #print( "readimage: " + str( time() - start ) )
            #sleep(0.1)

if len( sys.argv[1:] ) == 0:

    #Start webcam reader thread:
    camThread = webCamGrabber()
    camThread.start()

    #Setup window for results
    detector = dlib.get_frontal_face_detector()
    win = dlib.image_window()

    while True:
        #camThread.imageLock.acquire()
        if camThread.image is not False:
            print( "enter")
            start = time()

            myimage = camThread.image
            for row in myimage:
                for px in row:
                    #rgb expected... but the array is bgr?
                    r = px[2]
                    px[2] = px[0]
                    px[0] = r


            dets = detector( myimage, 0)
            #camThread.imageLock.release()
            print "your faces: %f" % len(dets)
            for i, d in enumerate( dets ):
                print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
                    i, d.left(), d.top(), d.right(), d.bottom()))
                print("from left: {}".format( ( (d.left() + d.right()) / 2 ) / len(camThread.image[0]) ))
                print("from top: {}".format( ( (d.top() + d.bottom()) / 2 ) /len(camThread.image)) )
            print( "process: " + str( time() - start ) )

            start = time()
            win.clear_overlay()
            win.set_image(myimage)
            win.add_overlay(dets)

            print( "show: " + str( time() - start ) )
            #dlib.hit_enter_to_continue()



for f in sys.argv[1:]:
    print("Processing file: {}".format(f))
    img = io.imread(f)
    # The 1 in the second argument indicates that we should upsample the image
    # 1 time.  This will make everything bigger and allow us to detect more
    # faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for i, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            i, d.left(), d.top(), d.right(), d.bottom()))

    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)
    dlib.hit_enter_to_continue()


# Finally, if you really want to you can ask the detector to tell you the score
# for each detection.  The score is bigger for more confident detections.
# Also, the idx tells you which of the face sub-detectors matched.  This can be
# used to broadly identify faces in different orientations.
if (len(sys.argv[1:]) > 0):
    img = io.imread(sys.argv[1])
    dets, scores, idx = detector.run(img, 1)
    for i, d in enumerate(dets):
        print("Detection {}, score: {}, face_type:{}".format(
            d, scores[i], idx[i]))

What's wrong with this webcam face detection?

4 Answers4