Alternatives to OpenCV for text / image overlay

Question

I'm (trying to) use OpenCV on an embedded Linux distribution running on an i.MX8QM processor. I have an MJPEG USB camera connected to this board that is able to produce a MJPEG output at 1920x1080 and 60FPS. This was confirmed in OpenCV and gstreamer. The final objective is to get frames from the camera and overlay some text/images on them. Now I'm running into a serious limitation of the VideoCapture class where it needs the frame to have 3 channels of data, but the Gstreamer pipeline that gets the frames from the camera and decodes them to a raw format is only able to produce 4-channel images (BGRx for example). As soon as I add a simple videoconvert element in the pipeline, the processing time increases from ~15ms to 500ms.

From everything I could find, this is because everything else (decoding from JPEG to RAW) is done in the hardware acceleration units of the processor, while removing that extra channel of data is done by the CPU. And I couldn't find any mention online of a solution that works. Now I'm trying to zoom out and understand if maybe I have taken a wrong turn somewhere:

Am I using the right camera? It seems that MJPEG is the most common format for the resolution and framerate I need, so I don't see much choice there.
Is OpenCV the right tool for the job? Are there any other libraries that maybe integrate better with this gstreamer pipeline and are able to work directly with BGRx frames?
Is everything configured correctly? I believe it is, judging by the fact that other people are reporting the same limitation and the reasoning makes sense.

I'm open to any ideas or suggestions. Thank you!

LE: I spent almost the entire weekend debugging this. I'll post what I've done, step by step:

Confirmed the camera used for testing is actually able to produce 1920x1080 frames, with JPEG compression at 60FPS. It is able to do so, confirmed both on Windows with a test app, as well as on the target itself, by eliminating all of the JPEG decoding work.
Upon further investigation, it seems that although the VPU is used to decode the individual JPEG frames, full MJPEG decoding is not actually done in HW. I was able to locate the release notes for the BSP in my build and it seems that my processor (i.MX8 QuadMax) doesn't actually support MJPEG decoding with the VPU. Which would also explain why I need to use v4l2jpegdec instead of the more aptly named v4l2video0jpegdec. The latter just produces a still frame.

please review [ask]. you have multiple questions. you are asking for recommendations. unverified assertions (what you say about opencv requiring 3 channels is false). — Christoph Rackwitz, Jul 08 '22 at 12:40
I apologize @ChristophRackwitz. The 3 channel limitation is however true, but it doesn't apply to the entire OpenCV library, just to the VideoCapture class. — Alex, Jul 08 '22 at 16:49
VideoCapture produces images, from a video device. you can't give it pictures. it is true that VideoCapture automatically converts anything to BGR color but for some backends you can disable that. set `CAP_PROP_CONVERT_RGB` to `0`. that'll then try to give you the raw data from the device, instead of trying to convert it to BGR. — Christoph Rackwitz, Jul 08 '22 at 16:51
Thank you. I will try to disable the automatic RGB conversion and get the BGRx data or something similar. To be honest I was using pictures and images interchangeably as synonyms. — Alex, Jul 08 '22 at 16:54
as do I. what I meant was it emits pictures/images, it does not accept pictures/images, so it can't "object to" (require) anything. — Christoph Rackwitz, Jul 08 '22 at 16:58
I must admit I'm not extremely familiar with how VideoCapture works in the background, but doesn't it rely on gstreamer to take in the frames from the camera, decode them then forward them to Opencv? Or is the entire gstreamer pipeline running inside VideoCapture? — Alex, Jul 08 '22 at 17:51
it might try to use gstreamer, but only as a last resort. first it tries native system apis (V4L, dshow/msmf, ...), then ffmpeg. what is tried also depends on whether the source is a file or device. *ideally* there is no gstreamer involved *at all*. gstreamer is a burden. its cost is rarely worth it (but sometimes it is). — Christoph Rackwitz, Jul 08 '22 at 20:02
Unfortunately it seems that VideoCapture ignores CAP_PROP_CONVERT_RGB for Gstreamer pipelines. GStreamer note: The flag is ignored in case if custom pipeline is used. It's user responsibility to interpret pipeline output. — Alex, Jul 09 '22 at 08:53
then I'm guessing it doesn't do the default conversion *anyway* in case of a custom gstreamer pipeline, i.e. it's disabled by default here, and there's nothing to disable, but also no conversion to enable either (which is the default otherwise)... i'm not set up to replicate any of this (no gstreamer). — Christoph Rackwitz, Jul 09 '22 at 09:23

score 1 · Answer 1 · answered Jul 08 '22 at 11:23

Your cam (ISP on it) does per-frame video encoding (motion JPEG), which is good for some applications. But also there're many cameras with a raw output in many formats. You can read about formats here, and take a look at other camera's specs (not only USB cameras, but MIPI or Ethernet too)
If you want to write text on a frame, you can use Gstreamer without OpenCV. E.g., textoverlay plugin. You can use it from command line, like:
```
gst-launch-1.0 -v videotestsrc ! textoverlay text="Room A" valignment=top halignment=left font-desc="Sans, 72" ! autovideosink
```
Or, if your text varies during the time, you can use GStreamer's C api. Same with images, you can use another plugin - gdkpixbufoverlay in command line or C API manner.
Actually, I don't think there's such a problem:

serious OpenCV limitation where it needs the frame to have 3 channels of data

And I doubt that:

but the Gstreamer pipeline that gets the frames from the camera and decodes them to a raw format is only able to produce 4-channel images (BGRx for example)

And from what I understand, you don't need videoconvert, but capsfilter instead.

Summing up, Gstreamer a very powerful framework, and using only it you can easily solve the task.

Thank you for the reply. I was mistaken and the limitation seems to be from the VideoCapture class, not OpenCV itself. — Alex, Jul 08 '22 at 16:38
There is however a limitation in the VideoCapture class whereby it doesn't accept 4 channel images. And the last element in my gstreamer pipeline is imxvideoconvert_g2d which unfortunately can't output BGR — Alex, Jul 08 '22 at 16:48
As I mentioned, try to use pure gstreamer. First of all, as command line parameters, than if it works fine, try to use C API for extentions — Maxim Lyuzin, Jul 08 '22 at 19:31

Alternatives to OpenCV for text / image overlay

1 Answers1