"We have a device that creates video files in MP4 file format containing H.264 video data."
Two possibilities:
Least likely...
(1) Those null bytes appear to be padding to make everything 32-bit aligned. This way you can read through the SPS in 4-byte chunks (using some readInt() command or similar).
For your length of 52 bytes (0x34) you would get 13 integers/chunks.
PS: Bytes can also be zero padded until a NALU starts on a new line/row.
(eg: Is obvious if displayed in a traditional "16 bytes per row" view of hex data).
Most likely...
(2) Those 4 zero-bytes are valid bytes of your SPS since the NALU size encapsulates them within SPS data. This would answer your question of: "Are these technically allowed by the standard?"
as Yes since they are part of the actual SPS data itself. You unknowingly confirmed this with your "Within the stsd
header, in the AVCConfigurationBox
, we also see these extra null bytes." ...because they are supposed to be there.
...
"In an Annex-B byte-stream, they would be allowed, but not here, I think."
Note: SPS is known as Codec Private Data and can be stored as either Annex-B or AVCC format regardless of the MP4's own format (eg: they can be mixed together in some MP4 files).
...
"We have some Python code checking this and complaining. So do we need to change the code in the device, or the checking code?"
I would leave the MP4 bytes as they are (from device?) and just fix the checking side. For example what does it actually complain about? If the size is 52 bytes, then it must read following 52 bytes as SPS content. Then it can confirm a new NALU by skipping +4 bytes (to skip past the "length size" bytes and check to see if it has either an 0x06
for SEI, or an 0x65
for a keyframe, or an 0x41
for P/B frame.
In your image: It looks like you have 52 bytes of SPS, then 4 bytes of PPS and then 36 bytes of SEI.