MPEG-2 AAC frame-by-frame audio decoding

Question

I have individual audio frames encoded in MPEG-2 AAC. Each frame consists of 1024 16 bit PCM samples.

I notice that each AAC frame is a different size. I assume this is a result of the MPEG-2 AAC compression algorithm and perfectly normal.

I need a way to decode a single frame and get back the original 1024 PCM samples (with error from lossy compression, that's fine).

I couldn't find information about the MPEG-2 AAC algorithm ANYWHERE online. It's kinda nuts.

I've been trying a crude work around using a library called pydub, which contains a few methods which use FFMPEG's AAC decoder. Trying to load the audio frame as an AudioSegment using AAC encoding:

audioData = BytesIO(frame)
sound = AudioSegment.from_file(audioData, format="aac")

gives the following error:

[aac @ 000002d444c1aa00] Estimating duration from bitrate, this may be inaccurate\r\n
Input #0, aac, from 'C:\\Users\\jmk_m\\AppData\\Local\\Temp\\tmpjl3x0xao':\r\n
  Duration: 00:00:00.19, bitrate: 23 kb/s\r\n
Stream #0:0: Audio: aac (LC), 22050 Hz, mono, fltp, 23 kb/s\r\nStream mapping:\r\n
Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))\r\n
Press [q] to stop, [?] for help\r\n
Output #0, wav, to 'C:\\Users\\jmk_m\\AppData\\Local\\Temp\\tmpxmp942e4':\r\n
Metadata:\r\n
ISFT            : Lavf58.10.100\r\n
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s\r\n
Metadata:\r\n      encoder         : Lavc58.13.100 pcm_s16le\r\n
[aac @ 000002d444cc7480] Reserved bit set.\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
[aac @ 000002d444cc7480] Reserved bit set.\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
[aac @ 000002d444cc7480] Prediction is not allowed in AAC-LC.\r\n
Error while decoding stream #0:0: Invalid data found when processing input\r\n
size=       2kB time=00:00:00.04 bitrate= 366.2kbits/s speed=5.45x    \r\n
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.808594%\r\n
Conversion failed!\r\n"

If anyone has any insights as to what may be causing the error, or any alternative approaches, that'd be greatly appreciated!

It's not a file. Its a file-like object using BytesIO in Python. AAC requires header information to read like an audio file. I don't have that since I'm simply working with a single audio frame. — jmkmay, May 04 '18 at 13:55
This line makes it look like there is a file there: Input #0, aac, from 'C:\\Users\\jmk_m\\AppData\\Local\\Temp\\tmpjl3x0xao':\r\n and also shows your input is AAC-LC and this one suggests your input has an error Error while decoding stream #0:0: Invalid data found when processing input\r\n — Andrew, May 06 '18 at 23:28
That's the location where FFMPEG puts a temp file of the audio bytes to do processing. The library requires a place to put audio data when passed raw bytes object so it creates a temporary file and puts it there. — jmkmay, May 07 '18 at 13:21
I think it is PyDub doing the tmpFile https://github.com/jiaaro/pydub/blob/master/pydub/audio_segment.py#L474 and https://github.com/jiaaro/pydub/blob/master/pydub/audio_segment.py#L490-L508 you could likely add some extra logging there to get more info on what your error is but I'd say you likely have enough above re: invalid data on input but if you dumped your input to a file it would be available for further diagnosis — Andrew, May 08 '18 at 21:30

MPEG-2 AAC frame-by-frame audio decoding

0 Answers0