9

How can I encode HDR10 videos from RGB images using the ffmpeg command-line, with the proper color and luminance metadata?

There are a lot of ffmpeg command-line examples floating around when searching information about how to encode HDR10 streams properly, but I didn't find any that has a comprehensive list of all the parameters you can tune when using RGB frames as inputs.

For example:

  • This article has a lot of information, but uses an existing video as input.
  • Similarly, this article uses an existing video as input
F.X.
  • 6,809
  • 3
  • 49
  • 71

1 Answers1

19

Traditional videos need to have the following sets of metadata:

  • The transfer function determines how you should map encoded RGB or YUV values into display luminance (-color_trc in ffmpeg options, transfer in x265 options). Common options for HDR videos are HLG (arib-std-b67) or PQ (smpte2084), and a proper application of HDR10 requires PQ.
  • The colorspace primaries define how encoded RGB values map into real colors (-color_primaries for ffmpeg, colorprim for x265). For common HDR formats (including HDR10) you need bt2020.
  • The color matrix is used to convert between RGB values (which are used to display data on the screen) and YUV values (which make the video encoders more efficient, because they roughly separate luma (Y) from chroma (UV) channels), and correspond to -colorspace in ffmpeg and colormatrix in x265. For common HDR formats (including HDR10) you need bt2020nc.
  • The signal range determines if RGB values between 0-100% are mapped to the full 0-255 (in 8-bit) or 0-1023 (in 10-bit) range, or if a margin is reserved for internal use (-color_range in ffmpeg, range in x265). It is traditionally "limited" (or: "tv", "narrow") in videos and "full" (or: "pc") in RGB images.

HDR10 defines additional metadata on top of that, which are intended to be used by the OS/player to tune the video output on non-HDR capable screens so that they can be displayed as best as they can under those constraints:

  • MaxFALL and MaxCLL describe resp. the average and maximum luminance of the video (max-cll x265 option)
  • Mastering display characteristics describe the "perfect" display the video is intended to be displayed on (master-display x265 option)

Now, a few things need some special care:

  • Traditional metadata can be set in both the video container and the x265 stream (so they should be set in both ffmpeg and x265 so as to avoid mismatch when playing).
  • Traditional metadata should be set for both the input (before -i) and output (after) formats, so that ffmpeg can do proper conversions when encoding.
  • HDR10 metadata are only present in the x265 options; ffmpeg knows nothing about them and they don't affect the video pixel values.
  • Unfortunately the default filter used by ffmpeg for format conversion (swscale, or -vf scale in the command-line) has a lot of issues when applied automatically. I found that it is best to always specify at least the signal range explicitly, or even use zscale which replaces it and is much better behaved. Several issues exist with swscale, so it is tricky to get right.
  • YUV to/from RGB conversions (which is what we're doing here since we use RGB images as inputs!) are different between traditional SDR (BT709) videos and HDR10 (BT2020). This is a very common source of errors, because a lot of ffmpeg internals assume the default (potentially wrong) conversion if not explicitly specified in both input and output formats!
  • The -pix_fmt for the input images should be determined automatically, but I included it here to be comprehensive ; it is also required if you want to stream from raw RGB data.
  • The -pix_fmt for the output video specifies that we want to pass 10-bit YUV data to the encoder and is required. HDR10 requires 10-bits with 4:2:0 subsampling i.e. yuv420p10le.
  • This command-line assumes your TIFF images already contain the correct color profile (they are already Rec.2020 PQ images). If you want to do conversions from e.g. sRGB or other spaces you need to set the specific formats for the input file and zscale filter properly.

Bearing that in mind, we can use the following command-line to convert Rec2020 PQ RGB images into an HDR10 Rec2020 PQ x265/HEVC video:

ffmpeg \

  # Some basic options
  -hide_banner \
  -loglevel verbose \

  # Scaling flags for the input
  # The print_info flag is very useful to debug cases when swscale is applied automatically!
  # The other flags ensure ffmpeg favors accuracy over (very limited) performance gains.
  -sws_flags print_info+accurate_rnd+bitexact+full_chroma_int \

  # Traditional metadata for input images (note the "pc" or "full" range)
  -color_range pc \
  -color_trc smpte2084 \
  -color_primaries bt2020 \
  -colorspace bt2020nc \
  -pix_fmt rgb48be \

  # Input frames as 16-bit big-endian RGB TIFF images with pixel values in Rec2020 HLG
  -framerate 30 -start_number 1 -i input-%03d.tif \

  # Explicitly configure the range conversion (so as to avoid using swscale)
  -vf zscale=rangein=full:range=limited \

  # Specify output codec (x265/HEVC)
  -c:v libx265 \

  # Traditional metadata for output video (note the "tv" or "limited" range)
  -color_range tv \
  -color_trc smpte2084 \
  -color_primaries bt2020 \
  -colorspace bt2020nc \
  -pix_fmt yuv420p10le \

  # Scaling flags for the output
  -sws_flags print_info+accurate_rnd+bitexact+full_chroma_int \

  # x265 HDR10 metadata
  -x265-params colorprim=bt2020:transfer=smpte2084:colormatrix=bt2020nc:range=limited:master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1):max-cll=1000,400 \

  # Output file
  output.mov

EDIT 2022/08/29: Changed the command-line proposed and clarified the use of the proper transfer function ; technically a correct application of the HDR10 format requires PQ, but the previous answer used HLG. HLG-encoded videos are still HDR, but cannot really be called "HDR10".

F.X.
  • 6,809
  • 3
  • 49
  • 71