Live audio streaming container formats

Question

When I start receiving the live audio (radio) stream (e.g. MP3 or AAC) I think the received data are not kind of raw bitstream (i.e. raw encoder output), but they are always wrapped into some container format. If this assumption is correct, then I guess I cannot start streaming from arbitrary place of the stream, but I have to wait to some sync byte. Is that right? Is it usual to have some sync bytes? Is there any header following the sync byte, from which I can guess the used codec, number of channels, sample rate, etc.?

When I connect to live stream, will I receive data starting by the nearest sync byte or I will get them from the actual position and I have to check for the sync byte first?

Some streams like icecast use headers in the HTTP response, where stream related information are included, but i think i can skip them and deal directly with the steam format.

Is that correct?
Regards,
STeN

score 3 · Answer 1 · answered Jul 17 '11 at 19:43

3

When you look at SHOUTcast/Icecast, the data that comes across is pure MPEG Layer III audio data, and nothing more. (Provided you haven't requested metadata.)

It can be cut at an arbitrary place, so you need to sync to the stream. This is usually done by finding a potential header, and using the data in that header to find sequential headers. Once you have found a few frame headers, you can safely assume you have synced up to the stream and start decoding for playback.

Again, there is no "container format" for these. It's just raw data.

Now, if you want metadata, you have to request it from the server. The data is then just injected into the stream every x number of bytes. See http://www.smackfu.com/stuff/programming/shoutcast.html.

answered Jul 17 '11 at 19:43

Brad

159,648
54
349
530

Thanks for the answer, I know about the SHOUTcast metadata insertion, but this I would like to avoid. Anyway you say those are raw data, but looking on the link suggested by 'yi_H' I found that the encoded data are divided into frames, i.e. the whole stream is built up from frames, while they have its own header and audio information - that's exactly what I was looking for. But i do not know (sorry fot that lack of knowledge) if this format is same for MP3 and AAC streams... – STeN Jul 18 '11 at 05:16
And I would like to also to know how 'MPEG-4 Part 14' fits into this (http://en.wikipedia.org/wiki/MPEG-4_Part_14). Sorry for asking, but first steps into this are hard... – STeN Jul 18 '11 at 05:17

score 3 · Answer 2 · edited May 23 '17 at 10:29

Doom9 has great starting info about both mpeg and aac frame formats. Shoutcast will add some 'metadata' now and then, and it's really trivial. The thing I want to share with you is this; I have an application that can capture all kind of streams, and shoutcast, both aac and mp3 is among them. First versions had their files cut at arbitrary point according to the time, for example every 5 minutes, regardless of the mp3/aac frames. It was somehow OK for the mp3 (the files were playable) but was very bad for aacplus.

The thing is - aacplus decoder ISN'T that forgiving about wrong data, and I had everything from access violations to mysterious software shutdowns with no errors of any kind.

Anyway, if you want to capture stream, open the socket to the server, read the response, you'll have some header there, then use that info to strip metadata that will be injected now and then. Use the header information for both aacplus and mp3 to determine frame boundaries, and try to honor them and split the file at the right place.

mp3 frame header:

http://www.mp3-tech.org/programmer/frame_header.html

aacplus frame header:

http://wiki.multimedia.cx/index.php?title=Understanding_AAC

also this:

aacplus frame alignment problems

Hi, if you can share some more links/information about what exactl header information I should look when analyzing aacplus and mp3 to determine frame boundaries I will highly appreciate that. Thanks1 — STeN, Jul 18 '11 at 06:26
Arbitrarily splitting MP3 files generally works, because each MP3 frame is self-contained and contains a separate header. That's why you can for example combine several mp3 files together (with cat for example) and the result will play. MP3 was originally designed for satellite radio streaming. Later formats such as AAC and others are not so robust, they contain a single header in the beginning of the file — PkP, Jun 22 '16 at 11:07

score 2 · Answer 3 · answered Jul 17 '11 at 13:24

2

Unfortunately it's not always that easy, check the format and notes here: MPEG frame header format

answered Jul 17 '11 at 13:24

Karoly Horvath

94,607
11
117
176

Cool! I have found a bug finally - I have captured the data, located the frame header and compared with the stream information I had and there was a difference in sample rate... Do you know if there is similar format for AAC or they both use the same? – STeN Jul 17 '11 at 14:14

score 0 · Answer 4 · answered Jul 18 '11 at 05:22

0

I will continue the discussion byu answering myself (even we are discouraged to do that):

I was also looking into streamed data and I have found that frequently the sequence ff f3 82 70 is repeated - this I suggest is the MPEG frame header, so I try to look what that means:

ff f3 82 70 (hex) = 11111111 11110011 10000010 01110000 (bin)

Analysis
11111111111 | SYNC
10          | MPEG version 2
01          | Layer III
1           | No CRC
1000        | 64 kbps
00          | 22050Hz
1           | Padding
0           | Private 
01          | Joint stereo
11          | ...

Any comments to that?

When starting receiving the streaming data, should I discard all data prior this header before giving the buffer to the class which deals with the DSP? I know this can be implementation specific, but I would like to know what are in general the proceedings here...

BR STeN

answered Jul 18 '11 at 05:22

STeN

6,262
22
80
125

Depending on the DSP you use, you can even feed it garbage and it will play what's playable and discard the rest. – Daniel Mošmondor Jul 18 '11 at 06:03
Yeah - I also think this depends on HW/drivers. Currently if I do not care about the MP3/AAC stream at all I have always 1-2 seconds (longer on higher bitrates and worse with AAC then with MP3) random sounds before the live streams starts to be played. I think cutting the stream until the MPEG sync header comes might help... But is the 'MPEG frame header' I am describing the frame I am looking for or there is something else? Are MP3 and AAC streams same? I captured the AAC stream as well and tried to analyze, but I am not sure whether it uses the same format as MP3 or not. – STeN Jul 18 '11 at 06:21
Mpeg frame header is all you need to know, when you strip out metadata. aac and mp3 is completely different stuff. – Daniel Mošmondor Jul 18 '11 at 07:48

Live audio streaming container formats

4 Answers4

Linked