4

I'm trying to capture a single image on demand from an RTSP H.264 video frame. I'm using OpenCV with Python running on a Raspberry Pi.

My understanding is that you can't simply capture an image, but rather must constantly read the stream of images from the video and discard all but the occasional one you want. This is very computationally expensive and consumes about 25% of the CPU on a Pi to do nothing but read and discard 1280x720 15 fps H.264 rtsp video frames.

Is there another way? I'm flexible and can also use GStreamer, FFMPEG or anything else that is more computationally efficient.

FarNorth
  • 289
  • 1
  • 7
  • 16

3 Answers3

2

The reason you have to read the stream is because H.264 has multiple kinds of frames (see https://en.wikipedia.org/wiki/Video_compression_picture_types) and P and B frames need context to be decoded. Only I-frames (also known as key-frames) can be decoded standalone.

If you want to read truly arbitrary frames, you can parse (not decode) stream, and keep everything since the last I-frame. When your trigger comes, you decode stream since the last I-frame and until current point.

If you do not need to be very precise, you can just store the last I-frame, and decode it on demand. This will be very fast, but this means that you may get picture at the wrong time.

Finally, how often do those keyframes come? This depends on the source. For example, C920 webcam generates them about every 5 seconds by default, but this interval can be changed from 1 to 30 seconds (I think, this was a while ago)

theamk
  • 1,420
  • 7
  • 14
  • I am aware of how H.264 compresses data, however I forgot the spacing of I-frames can be significant and need to be mindful of. Thanks for raising this point. – FarNorth Feb 07 '19 at 18:41
2

I was doing something similar. Here is my code:

def CaptureFrontCamera():
    _bytes = bytes()
    stream = urllib.request.urlopen('http://192.168.0.51/video.cgi?resolution=1920x1080')
    while True:
        _bytes += stream.read(1024)
        a = _bytes.find(b'\xff\xd8')
        b = _bytes.find(b'\xff\xd9')
        if a != -1 and b != -1:
            jpg = _bytes[a:b+2]
            _bytes = _bytes[b+2:]
            filename = '/home/pi/capture.jpeg'
            i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
            cv2.imwrite(filename, i)
            return filename
Koxo
  • 507
  • 4
  • 10
  • This looks promising, however I wasn't able to figure out how to do it with an rtsp (rather than http) video stream. Any thoughts? – FarNorth Feb 07 '19 at 18:44
  • Your link refers to using cv2.VideoCapture() to read rtsp streams, which is how I'm doing it now. The problem is read() consumes a ton of processing power and yields an image I don't need 99% of the time. Ideally I want it to ingest the byte stream and only construct the image when required. – FarNorth Feb 08 '19 at 15:47
  • Sorry I can't help you more. I am pretty newbie in image processing. Hope, you will find solution. – Koxo Feb 11 '19 at 06:15
1

To answer my own question. Instead of using read():

cap = cv2.VideoCapture('rtsp_url')

def captureimages():
    while True:
        image = cap.read()

s = threading.Thread(target=captureimages)
s.start()

if takepic == True:
    picture = image.copy()

It is more efficient to break it up in to grab() and retrieve(). Not a perfect solution, but better:

cap = cv2.VideoCapture('rtsp_url')

def captureimages():
    while True:
        cap.grab()

s = threading.Thread(target=captureimages)
s.start()

if takepic == True:
    picture = cap.retrieve()
FarNorth
  • 289
  • 1
  • 7
  • 16