I have individual audio frames encoded in MPEG-2 AAC. Each frame consists of 1024 16 bit PCM samples.
I notice that each AAC frame is a different size. I assume this is a result of the MPEG-2 AAC compression algorithm and perfectly normal.
I need a way to decode a single frame and get back the original 1024 PCM samples (with error from lossy compression, that's fine).
I couldn't find information about the MPEG-2 AAC algorithm ANYWHERE online. It's kinda nuts.
I've been trying a crude work around using a library called pydub
, which contains a few methods which use FFMPEG's AAC decoder. Trying to load the audio frame as an AudioSegment using AAC encoding:
audioData = BytesIO(frame)
sound = AudioSegment.from_file(audioData, format="aac")
gives the following error:
[aac @ 000002d444c1aa00] Estimating duration from bitrate, this may be inaccurate\r\n
Input #0, aac, from 'C:\\Users\\jmk_m\\AppData\\Local\\Temp\\tmpjl3x0xao':\r\n
Duration: 00:00:00.19, bitrate: 23 kb/s\r\n
Stream #0:0: Audio: aac (LC), 22050 Hz, mono, fltp, 23 kb/s\r\nStream mapping:\r\n
Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))\r\n
Press [q] to stop, [?] for help\r\n
Output #0, wav, to 'C:\\Users\\jmk_m\\AppData\\Local\\Temp\\tmpxmp942e4':\r\n
Metadata:\r\n
ISFT : Lavf58.10.100\r\n
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s\r\n
Metadata:\r\n encoder : Lavc58.13.100 pcm_s16le\r\n
[aac @ 000002d444cc7480] Reserved bit set.\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
[aac @ 000002d444cc7480] Reserved bit set.\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
size= 2kB time=00:00:00.04 bitrate= 366.2kbits/s speed=5.45x \r\n
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.808594%\r\n
Conversion failed!\r\n"
If anyone has any insights as to what may be causing the error, or any alternative approaches, that'd be greatly appreciated!