How to retrieve raw data from YUV2 streaming

Question

I am interfacing qvga sensor streaming out yuv2 format data through host application on windows (usb). How can I use any opencv-python example application to stream or capture raw data from yuv2 format.

How can I do that? Is there any test example to do so?

//opencv-python (host appl)
import cv2
import numpy as np
    
# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)
# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)
while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()
    # Display the resulting frame
    cv2.imshow('frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Code sample for grabbing video frames without decoding:

import cv2
import numpy as np

# open video0
# -------> Try replacing cv2.CAP_MSMF with cv2.CAP_FFMPEG):
cap = cv2.VideoCapture(0, cv2.CAP_FFMPEG)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    print('frame.shape = {}    frame.dtype = {}'.format(frame.shape, frame.dtype))

cap.release()

In case cv2.CAP_FFMPEG is not working, try the following code sample:

import cv2
import numpy as np

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)

# -----> Try setting FOURCC and disable RGB conversion:
#########################################################
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    
#########################################################

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    print('frame.shape = {}    frame.dtype = {}'.format(frame.shape, frame.dtype))

cap.release()

Reshape the uint8 frame to 680x240 and save as img.png:

import cv2
import numpy as np

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
cap.set(cv2.CAP_PROP_FPS, 30) # set fps

# Disable the conversion to BGR by setting FOURCC to Y16 and `CAP_PROP_CONVERT_RGB` to 0.
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    cols = 340*2
    rows = 240

    img = frame.reshape(rows, cols)

    cv2.imwrite('img.png', img)

cap.release()

//680x240 img.png

//in presence of hot object (img1.png)

//processed image (hot object)

//with little-endian (test)

//test image (captured) with CAP_DSHOW

//test image (saved) with CAP_DSHOW

//680x240 (hand.png)

//680x240 (hand1.png)

//fing preview

//fing.png

@Rotem not yet. I am getting the video data from sensor where it is converted to YUV2 format and throw to USB (uvc) and the above posted image shows what I am getting at hos application on windows. yuv2 format described here: https://www.fourcc.org/pixel-format/yuv-yuy2/ https://stackoverflow.com/questions/36228232/yuy2-vs-yuv-422 — Emlinux, Jan 28 '22 at 03:19
`YUY2` and `YUV2` is not the same thing... I wonder if your camera is a Grayscale camera, and not colored. Can you please add few details about the model of the camera (or sensor)? — Rotem, Jan 28 '22 at 09:02
yes, that i understand. what i am getting is YUV2 format out (as per the image). The actual out of sensor is raw16 which was internally (in uvc descriptors) mapped with YUV2 GUID and produce the yuv2 out format. this IR sensor is based on research oriented,so there is no model no. sorry. could you please make me understand about this data, what actually this is and whai need to check more to get the image. thanks. — Emlinux, Jan 29 '22 at 05:36
It's going to be very challenging to find a solution without some documentation. Can you please post the code you are using for grabbing the frames (I know it's generic, but just for having a baseline...). Do you understand that IR sensor has no color, so the format can't be YUV (U and V are chroma channels)? — Rotem, Jan 29 '22 at 09:04
@Rotem thanks for your comment; this is what I defined like (updated the post) since I am using fx3 board to interface the sensor, I followed this document to interface my sensor. please refer: https://www.infineon.com/dgdl/Infineon-AN75779_How_to_Implement_an_Image_Sensor_Interface_with_EZ-USB_FX3_in_a_USB_Video_Class_(UVC)_Framework-ApplicationNotes-v13_00-EN.pdf?fileId=8ac78c8c7cdc391c017d073ad2b85f0d — Emlinux, Jan 29 '22 at 14:09
I thought you were using `opencv-python` for getting the colored image in your post. Use the generic example for grabbing video from your camera (grab a colored frame that looks like the posted frame). In case you can't read the video frames using OpenCV, I don't think I can help you. — Rotem, Jan 29 '22 at 16:56
@Rotem yes, I tested it with opencv-python test example as host application to stream the data receiving through usb. I updated post with opencv-python code, please check and suggest~ thanks. — Emlinux, Jan 30 '22 at 05:43
I edited you post with a example for grabbing video frames without decoding. The important line is `cap.set(cv2.CAP_PROP_FORMAT, -1)`. Can you please execute the sample, and tell what are the printed values of `frame.shape` and `frame.dtype`? — Rotem, Jan 30 '22 at 08:52
@Rotem Thanks for editing and sharing the example. Here is the print log... frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 — Emlinux, Jan 30 '22 at 15:39
Is that so??? I was expecting she shape to be `(1, 41472)`. I am guessing Media Foundation backend doesn't support `cap.set(cv2.CAP_PROP_FORMAT, -1)`. Can you try replacing `cap = cv2.VideoCapture(0, cv2.CAP_MSMF)` with `cap = cv2.VideoCapture(0, cv2.CAP_FFMPEG)`? — Rotem, Jan 30 '22 at 15:55
In case `CAP_FFMPEG` is not working, you may try the following [example] (https://stackoverflow.com/questions/66909370/thermal-image-processing). Add `cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' '))` and `cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)`. — Rotem, Jan 30 '22 at 16:04
I updated your question with two more tests. please tell me if `frame.shape` and/or `frame.dtype` are different? — Rotem, Jan 30 '22 at 16:06
@Rotem CAP_FFMPEG didn't work. So I tried the next one and here is the log. frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 frame.shape = (1, 163200) frame.dtype = uint8 — Emlinux, Jan 30 '22 at 16:32
Great progress... I updated your question with one more code sample. This time we are reshaping the frame to 680x240, and save it as PNG. Can you grab a frame and add `img.png` to your post? — Rotem, Jan 30 '22 at 17:19
I am getting the following [result](https://i.stack.imgur.com/d53Bj.png). Does it make sense? — Rotem, Jan 30 '22 at 17:59
@Rotem Thanks but I am not sure about it. sorry I didn't understand this 'result'. since the sensor is thermal IR, but initially I am just trying to get image as per raw output and later to map with coloring and temperature mapping. please suggest~ — Emlinux, Jan 30 '22 at 18:17
I am not going to help you with the coloring and temperatures. I am trying to help you to grab the raw frames. Can you take a picture of something hot? Something that is going to be meaningful for testing? I don't know the sensitivity, so I don't know how hot should it be. I thing the back and white dots are just dead pixels. — Rotem, Jan 30 '22 at 18:21
@Rotem Thank you for making me understand about it. I added the image (having hot in front of sensor (firelighter)). It shows light spot in middle. Could you please check and suggest~ — Emlinux, Jan 31 '22 at 03:41

Rotem · Answer 1 · 2022-02-03T14:35:38.463

0

The true format of the pixels in your video is int16 grayscale pixel, but it is marked as YUV2 format (probably for compatibility with grabbers that do not support 16 bit).

I saw the same technique use by the RAVI format.

The default behavior of OpenCV is converting the frames from YUV2 to BGR format.
Since the format has no color (and just marked as YUV2), the conversions messes up your data.

I could be wrong here... but it looks like the format is "big endian" and signed 16 bits.

Here is a complete code sample for grabbing and displaying the video:

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cols, rows = 340, 240
cap.set(cv2.CAP_PROP_FRAME_WIDTH, cols)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, rows)
cap.set(cv2.CAP_PROP_FPS, 30) # set fps

# Disable the conversion to BGR by setting FOURCC to Y16 and `CAP_PROP_CONVERT_RGB` to 0.
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    # Convert the frame from uint8 elements to big-endian signed int16 format.
    frame = frame.reshape(rows, cols*2) # Reshape to 680x240
    frame = frame.astype(np.uint16) # Convert uint8 elements to uint16 elements
    frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
    frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

    # Apply some processing for disapply (this part is just "cosmetics"):
    frame_roi = frame[:, 10:-10]  # Crop 320x240 (the left and right parts are not meant to be displayed).
    # frame_roi = cv2.medianBlur(frame_roi, 3)  # Clean the dead pixels (just for better viewing the image).
    frame_roi = frame_roi << 3  # Remove the 3 most left bits ???
    normed = cv2.normalize(frame_roi, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)  # Convert to uint8 with normalizing (just for viewing the image).

    cv2.imshow('normed', normed)  # Show the normalized video frame

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    # cv2.imwrite('normed.png', normed)

cap.release()
cv2.destroyAllWindows()

Left shifting each pixel by 3 (frame_roi = frame_roi << 3) fixes most of the issues.

It could be that the upper 3 bits are not in place, or have some different meaning?

The ROI cropping, and normalizing are just "cosmetics", so you can see something.

Here is the processed image you have posted (with the hot object):

For little endian, replace the following lines:

frame = frame.reshape(rows, cols*2) # Reshape to 680x240
frame = frame.astype(np.uint16) # Convert uint8 elements to uint16 elements
frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

With:

frame = frame.view(np.int16).reshape(rows, cols)

In case the value are all positive (uint16 type) try:

frame = frame.view(np.uint16).reshape(rows, cols)

Sketch code for processing the image for display:

frame = cv2.imread('hand1.png', cv2.IMREAD_UNCHANGED)  # Read input image (grayscale uint8)


# create a CLAHE object (Arguments are optional).
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))


# Convert the frame from uint8 elements to big-endian signed int16 format.
frame = frame.reshape(rows, cols * 2)  # Reshape to 680x240
frame = frame.astype(np.uint16)  # Convert uint8 elements to uint16 elements
frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

# Apply some processing for display (this part is just "cosmetics"):
frame_roi = frame[:, 10:-10]  # Crop 320x240 (the left and right parts are not meant to be displayed).
# frame_roi = cv2.medianBlur(frame_roi, 3)  # Clean the dead pixels (just for better viewing the image).

#frame_roi = frame_roi << 3  # Remove the 3 most left bits ???
frame_roi = frame_roi << 1  # Remove the 1 most left bits ???

# Fix the offset difference between the odd and even columns (note: this is not a good solution).
#frame_as_uint16 = (frame_roi.astype(np.int32) + 32768).astype(np.uint16)
frame_as_uint16 = frame_roi.view(np.uint16)  # Try to interpret the data as unsigned
frame_as_float = frame_as_uint16.astype(np.float32) / 2  # Divide by 2 for avoiding overflow
med_odd = np.median(frame_as_float[:, 0::2])
med_evn = np.median(frame_as_float[:, 1::2])
med_dif = med_odd - med_evn
frame_as_float[:, 0::2] -= med_dif/2
frame_as_float[:, 1::2] += med_dif/2
frame_as_uint16 = np.round(frame_as_float).clip(0, 2**16-1).astype(np.uint16)

cl1 = clahe.apply(frame_as_uint16)  # Apply contrast enhancement.
normed = cv2.normalize(cl1, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)  # Convert to uint8 with normalizing (just for viewing the image).

cv2.imwrite('normed.png', normed)

cv2.imshow('normed', normed)
cv2.waitKey()
cv2.destroyAllWindows()

edited Feb 03 '22 at 14:35

answered Jan 30 '22 at 18:37

Rotem

30,366
4
32
65

Thank you for well explaining. yes, you are right. The true pixel is of 2 bytes each and is marked as YUV2 format for compatibility. The format is 'little endian'. Can I test using 'little endian'? I added processed image (with hot object). Please check. Thanks. – Emlinux Jan 31 '22 at 03:50
I know it supposed to be little endian, but the result looks like noise. I added to my post an example for little endian conversion. There is probably some other issue (camera configuration issue?). Why are you using `cv2.CAP_MSMF`? Is it working with `cv::CAP_DSHOW`? Do you know if the pixels suppose to have negative values? – Rotem Jan 31 '22 at 09:26
yes seems noisy. I posted the tested image (with little endian). tested with both 'np.int16' and 'np.uint16'. I checked with using "cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)" but it gives me error as "frame = frame.view(np.int16).reshape(rows, cols) ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array." – Emlinux Jan 31 '22 at 10:22
OK, check frame.shape and frame.dtype – Rotem Jan 31 '22 at 10:36
Using this `cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)` gives me... `frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8` – Emlinux Jan 31 '22 at 12:38
It looks like `CAP_DSHOW` it's not working properly, because according to the shape, the frame is converted to BGR (the 3 color channels applies BGR format). I am out of ideas... – Rotem Jan 31 '22 at 13:51
ok, I will try to check more details and recheck the configuration in case. Please suggest me if there is any try solution comes up in your mind. I wish to try that . Thanks. – Emlinux Jan 31 '22 at 16:12
I tried it like... 1) I tried to change the receiving frame data format as 'Y16' in uvc driver. br/ 2) since the format is now 'Y16', I comment out line `#cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' '))` br/ 3) run the python script with CAP_DSHOW and it display the test image (please check the posted image) br/ seems, CAP_DSHOW is working. Please suggest me if this make any sense. – Emlinux Feb 01 '22 at 15:14
It looks like `CAP_DSHOW` after the update, gives the same result as `CAP_MSMF`. – Rotem Feb 01 '22 at 15:40
yes, looks like! – Emlinux Feb 01 '22 at 15:57
Is there any way to configure the camera outside of OpenCV and Python? Does the camera comes with some kind of control software? Can you contact the manufacture for support? – Rotem Feb 01 '22 at 16:07
not sure but I will try. Could you please let me know what should I check more specifically with the camera configuration? since there are pre-defined registers with default values to get the default image of size 340x240 initially. – Emlinux Feb 01 '22 at 16:14
The size looks correct. Search for other configuration parameters. Is there a way to configure the camera output to synthetic pattern (for testing) instead of IR video? – Rotem Feb 01 '22 at 16:19
sorry, I am not familiar with such a way to test (with synthetic pattern). Also, I just re-checked 'byte order' of pixel-data-word that is 'big-endian' (msb first) instead of little-endian. sorry for misunderstood earlier. – Emlinux Feb 01 '22 at 16:53
I added left shifting each pixel by 3 (`frame_roi = frame_roi << 3`), it fixes most of the issues. There is something weird that looks like column interlace. I never encountered columns interlace (only rows interlace). Could it be column interlace? – Rotem Feb 01 '22 at 17:53
Thanks for your update. I can see the similar after update. May I please know the cause of shifting by 3? seems, it should be row interlace scan. how can I check the general image (no hot object) like 'hand in front' something, its showing complete gray. Is it so? please suggest~ – Emlinux Feb 02 '22 at 06:41
I don't know if the shifting is the correct solution. All I know is that discarding the upper 3 bits of each pixel improves the result. Add an image with your hand as `680x240` PNG image to your post. I see if there is a way to see it. You should also be aware to the fact that you need to do some calibration procedures (I don't know the calibration procedures, because it is specific to your camera model). – Rotem Feb 02 '22 at 08:49
Okay. I'll go through it to get the calibration way, if there is any. 680x240 image posted. Please check. Thank you! – Emlinux Feb 02 '22 at 09:19
The following [image](https://i.stack.imgur.com/EGCiU.png) is a result of my best attempt. No hand... before applying camera calibrations, I suggest you to check all the registers configuration. It looks like there is a configuration issue. – Rotem Feb 02 '22 at 15:31
I reconfigured the registers with defaults. re-test and posted the image (680*240). Could you please check that? – Emlinux Feb 03 '22 at 11:39
Still no hand... I added the "sketch code" that I used for testing. You may try it yourself. Now it looks like the data is unsigned (`uint16` and not `int16`), and only shift left by 1 is required (not by 3). – Rotem Feb 03 '22 at 14:39
Thanks. since I am trying to test it with different combinations, there is no result yet. but in some case I can see shadow image (_waving fingers_) grayscale, how can I filter this to get proper image? any suggestions please! – Emlinux Feb 08 '22 at 04:29
Using `cv2.createCLAHE` as I posted, is the best way I can think of. Beside that, you have to map the dead pixels and replace them with the neighbors (dead pixels are "stealing" some contrast). Another important thing is reducing the **non-uniformity**. I posted a basic columns correction (by matching the medians). The right way applies calibration procedures. – Rotem Feb 08 '22 at 09:03
Hello. Thanks for your suggestion. I tried some calibration ways as per the sensor information and not sure, but better is that I can at least see the image shadow (_finger shadow, image posted_) but it s not clear (if this is due to the contrast, how can I implement that?). Also attached the image in 680*240 format as 'fing.png'. Please suggest~ Thanks! – Emlinux Feb 22 '22 at 07:07
It looks like you didn't try the "Sketch code for processing the image for display" code part. The image for display is supposed to be 320x240 and not 340x240. – Rotem Feb 22 '22 at 08:31
I tried the same just comment out the crop "_frame_roi = frame[:, 10:-10]_" and tested with _frame_roi = frame[:, :]_ to keep it as 340x240. Does it effect much? I can try with 320x240 too. – Emlinux Feb 22 '22 at 08:52
tested with _"frame_roi = frame[:, 10:-10]"_ ; image posted. Please check. – Emlinux Feb 22 '22 at 08:59
It looks like something with the sensor configuration is still not right. It could also be related to the camera optics. We can go with this forever... I don't think I can help you any further. – Rotem Feb 22 '22 at 09:59
I am also not sure...anyway, thanks for your kind suggestions and help so far. – Emlinux Feb 23 '22 at 09:09

How to retrieve raw data from YUV2 streaming

1 Answers1

Linked