I'm (trying to) use OpenCV on an embedded Linux distribution running on an i.MX8QM processor. I have an MJPEG USB camera connected to this board that is able to produce a MJPEG output at 1920x1080 and 60FPS. This was confirmed in OpenCV and gstreamer. The final objective is to get frames from the camera and overlay some text/images on them. Now I'm running into a serious limitation of the VideoCapture class where it needs the frame to have 3 channels of data, but the Gstreamer pipeline that gets the frames from the camera and decodes them to a raw format is only able to produce 4-channel images (BGRx for example). As soon as I add a simple videoconvert element in the pipeline, the processing time increases from ~15ms to 500ms.
From everything I could find, this is because everything else (decoding from JPEG to RAW) is done in the hardware acceleration units of the processor, while removing that extra channel of data is done by the CPU. And I couldn't find any mention online of a solution that works. Now I'm trying to zoom out and understand if maybe I have taken a wrong turn somewhere:
- Am I using the right camera? It seems that MJPEG is the most common format for the resolution and framerate I need, so I don't see much choice there.
- Is OpenCV the right tool for the job? Are there any other libraries that maybe integrate better with this gstreamer pipeline and are able to work directly with BGRx frames?
- Is everything configured correctly? I believe it is, judging by the fact that other people are reporting the same limitation and the reasoning makes sense.
I'm open to any ideas or suggestions. Thank you!
LE: I spent almost the entire weekend debugging this. I'll post what I've done, step by step:
- Confirmed the camera used for testing is actually able to produce 1920x1080 frames, with JPEG compression at 60FPS. It is able to do so, confirmed both on Windows with a test app, as well as on the target itself, by eliminating all of the JPEG decoding work.
- Upon further investigation, it seems that although the VPU is used to decode the individual JPEG frames, full MJPEG decoding is not actually done in HW. I was able to locate the release notes for the BSP in my build and it seems that my processor (i.MX8 QuadMax) doesn't actually support MJPEG decoding with the VPU. Which would also explain why I need to use v4l2jpegdec instead of the more aptly named v4l2video0jpegdec. The latter just produces a still frame.