3

Background

I'm developing a progressive download media streaming server in C++ using Boost. A typical configuration is an Android rendering device running Android 4.2.2, using a stock Gallery player as the media player, and the media streaming server running on a Windows desktop. The Android device requests a media file via an HTTP URL, with the media server streams the file using progressive download.

The Problem

When attempting to stream a video file with an internal bit rate of 20 Mbps, the renderer consistently stalls numerous times. A typical streaming experience consists essentially of multiple occurences of the steps:

  1. Render smoothly for 3-5 seconds
  2. Video stalls for 5-10 seconds
  3. Go to step 1

The most logical explanation is that the renderer is experiencing "buffer underrun" or "buffer underflow"

The Question

Is there any way to solve the buffer underrun problem, improving the media streamers output rate, and preventing visual stalling/blocking?

Technical Information

The server code looks like this:

void StreamFile (boost::asio::ip::tcp::socket *socket, const wchar_t *path)
{
    . . .
    for (long offset=startOffset; offset <= endOffset; offset+=streamingBlockSize)
    {
        long numBytesToRead = (std::min<long>) (endOffset - offset + 1, streamingBlockSize);
        fread (buffer, 1, numBytesToRead, f);
        if (RawSocketWrite (socket, buffer, numBytesToRead) == 0)
        {
            // RawSocketWrite() encountered a serious error, exit
            break;
        }
    }
    . . .
}

size_t RawSocketWrite (boost::asio::ip::tcp::socket *socket, const char *data, size_t len)
{
    size_t numCharsWritten = 0;

    try
    {
        numCharsWritten = boost::asio::write (socket, boost::asio::buffer (data, len));
    }
    catch (boost::system::system_error& e)
    {
        LOG_ERROR (("error", "write() failed in RawSocketWrite (socket %d) %s", socket->native (), e.what()));
        numCharsWritten = 0;
    }

    return numCharsWritten;
}

I'm trying to stream a 39 MB, 16 second video file with the following file data (courtesy of MediaInfo):

Video information

General
Complete name                            : TestVideo.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom
File size                                : 38.6 MiB
Duration                                 : 16s 102ms
Overall bit rate                         : 20.1 Mbps

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.0
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 1 frame
Format settings, GOP                     : M=1, N=61
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 15s 701ms
Bit rate                                 : 20.0 Mbps
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Variable
Frame rate                               : 30.000 fps
Minimum frame rate                       : 29.732 fps
Maximum frame rate                       : 30.313 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.322
Stream size                              : 37.5 MiB (97%)
Title                                    : VideoHandle
Language                                 : English
mdhd_Duration                            : 15701

Audio
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 16s 102ms
Source duration                          : 16s 131ms
Bit rate mode                            : Constant
Bit rate                                 : 192 Kbps
Nominal bit rate                         : 96.0 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 374 KiB (1%)
Source stream size                       : 375 KiB (1%)
Title                                    : SoundHandle
Language                                 : English
mdhd_Duration                            : 16102

The StreamFile() function streams blocks of 'streamingBlockSize' bytes in a tight loop to the output socket ('streamingBlockSize' is set through a configuration file, and was introduced while researching and debugging the current problem of buffer underrun).

Tracking the packets using Wireshark shows packets with 1448 bytes of streaming data being sent at an even pace:

|Time     | 192.168.0.197                         |
|         |                   | 192.168.0.199     |                   
|14.420722000|         SYN, ACK  |                   |Seq = 0 Ack = 1|         |(10243)  ------------------>  (58358)  |
|14.437750000|         PSH, ACK - Len: 266           |Seq = 1 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.437924000|         ACK - Len: 1448               |Seq = 267 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.437939000|         ACK - Len: 1448               |Seq = 1715 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.437950000|         ACK - Len: 1448               |Seq = 3163 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.442016000|         ACK - Len: 1448               |Seq = 4611 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.444269000|         ACK - Len: 1448               |Seq = 6059 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.444293000|         ACK - Len: 1448               |Seq = 7507 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.444358000|         ACK - Len: 1448               |Seq = 8955 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.444373000|         ACK - Len: 1448               |Seq = 10403 Ack = 188|         |(10243)  ------------------>  (58358)  |
|14.444389000|         ACK - Len: 1448               |Seq = 11851 Ack = 188|         |(10243)  ------------------>  (58358)  |
. . .
|72.768739000|         ACK - Len: 1448               |Seq = 40488067 Ack = 188|         |(10243)  ------------------>  (58358)  |
|72.768766000|         ACK - Len: 1448               |Seq = 40489515 Ack = 188|         |(10243)  ------------------>  (58358)  |
|72.772484000|         ACK - Len: 1448               |Seq = 40490963 Ack = 188|         |(10243)  ------------------>  (58358)  |
|72.772521000|         PSH, ACK - Len: 895           |Seq = 40492411 Ack = 188|         |(10243)  ------------------>  (58358) 

Wireshark provides very useful summary information about the packets above via the Statistics>Summary menu item:

Packets                        27997
Between first and last packet  58.352 sec
Avg. packets/sec               479.797
Avg. packet size               1513.586 bytes
Bytes                          42375867
Avg. bytes/sec                 726213.548
Avg. MBit/sec                  5.810

This tells us it took 58.352 seconds to transfer a 39 MB video that has a playing time of 16.102 seconds and whose renderer encounters frequent stalling. This sounds like a classic case of buffer underrun.

In addition, the average Mbps rate detected by Wireshark was 5.81 Mbps. By definition this can never satisfy a renderer that needs to render a video at a bit rate of 20.1 Mbps.

Possible Fixes

While researching the problem I've come across numerous technical issues that might contribute to the problem, and would appreciate your thoughts.

Increase size of buffer passed to write()

I've tried varying the amount of bytes passed to the write() function (e.g., 4096, 8192, 16384) to see if increasing the data size can speed up the transfer. It does not seem to make a difference (see discussion of MTU and MSS for possible explanation).

Increase Ethernet MTU (Maximum Transmission Unit) and/or TCP MSS (Maximum Segment Size)

Wireshark shows each TCP packet carries 1448 of the video raw data. Would increasing the MTU or MSS improve streaming throughput? http://www.stratus.com/blog/openvos/?p=1459 has an interesting comparison between MTU and MSS.

TCP_NODELAY

There are several pages discussing the socket setting TCP_NODELAY (see http://en.wikipedia.org/wiki/Transmission_Control_Protocol). My understanding is that it will improve multiple file transfer, which typically results in the last bytes of a file not filling the output buffer. By default TCP will wait 200 ms for the buffer to fill up. With TCP_NODELAY there will be no delay. In a single video file streaming situation I would not expect an improvement. Is this correct?

Network load variability

Could the network being used cause the data to stream too slowly?

boost::asio::write() is a blocking write -- would a non-blocking write help?

The boost::asio::write() at the very bottom is a blocking write:

try
{
    numCharsWritten = boost::asio::write (socket, boost::asio::buffer (data, len));
}

Is there maybe an intrinsic delay when using a blocking write() as opposed to a non-blocking write()? Would using a non-blocking write improve throughput?

Many thanks in advance for your help.

Moshe Rubin
  • 1,944
  • 1
  • 17
  • 37
  • how do you test it? over WiFi? Have you tested download bandwidth? – Andriy Tylychko May 28 '13 at 12:54
  • Apologies for taking so long to respond -- I needed to download LAN speed measuring software. When connected by Wi-Fi to the router I measured the server upload speed at 24 Mbps with choppy results. When connected by Ethernet cable to the router I measured the server upload speed at about 600 Mbps and rendering was consistently perfect. I believe this shows that my server software is serving up the video at a good rate. Is it safe to assume that the software is fine while the network is the probable source of choppiness? – Moshe Rubin May 29 '13 at 11:49
  • It's why you have buffers, to reduce choppiness. If you have large enough buffer and sufficient bandwidth in average, you shouldn't have any choppiness. – Andriy Tylychko May 29 '13 at 13:16
  • Since replying to you, I tested on a network with an Ethernet cabled bandwidth of 65 Mbps and still encountered choppiness on the renderer. What troubles me is that potential users will not always be connected via Ethernet cable. Assuming the renderer's buffer (which I do not control) is large enough, why is a bandwidth of 65 Mbps not sufficient for a 20.1 Mbps bit rate video to render smoothly? – Moshe Rubin May 29 '13 at 14:48
  • collect some stats, e.g. data sending speed etc., and log them out. TCP_NODELAY cannot help you, it's useful only in case of two-way intensive communication when latency matter. TCP itself can produce significant delays, but again large buffer should mask this problem. MTU size won't help you too, default value should be fine. TCP window size can help to avoid waiting for too frequent ACKs from receiver. But first of all gather more info what's going on. – Andriy Tylychko May 29 '13 at 17:01
  • When you refer to having a "large enough buffer" and that a "large enough buffer should mask this problem", are you referring to the size of the buffer passed to the TCP _write()_ function, or the TCP MSS (Maximum Segment Size)? If the latter, how can one change the MSS buffer size? – Moshe Rubin May 30 '13 at 06:26
  • Nope, I refer to your receiver buffer, or renderer buffer, anything that can store enough data to mask network temporal glitches – Andriy Tylychko May 30 '13 at 08:41

1 Answers1

0

I am trying to give only pointers, no solution.

Well as you would guess, this issue involves multiple aspects and any one of that could cause this - How is the Video encoded(Does it have B frames, only I-frames etc...), then the bandwidth of the network the android devices uses to access this HTTP Server - how congested it is during this test; The decoder library and the decoder application. Not a solution but, you can try and see the behavior of same test :

1)Using a different decoder / render application

2) Running this test at different times of day(when network load is possibly less)

3) Trying decoding/rendering on a different OS(Linux using mplayer or something) or on a windows using Windows media player) just to see if its got to do anything with Android OS tcp/ip stack implementation.

goldenmean
  • 18,376
  • 54
  • 154
  • 211
  • Thanks for the pointers. I had considered them, but it's good to hear it from someone else. Regarding (1), I'm dealing with rendering software that selects a particular renderer, so there's no leeway to select a different one. With (2) I certainly see an improvement when the network is less busy (e.g., when my co-workers go home for the day). As for (3) I've had good results when rendering using DLNA from my Windows machine to the rendering machine. In this case I *can* select the renderer application, but then I'm not testing my server software. – Moshe Rubin May 29 '13 at 12:09
  • 1
    Well then based on your finding (3) above in comment, it could possibly be your Server socket code. As u pointed try 1) using non-blocking write() 2)If possible, can you try POSIX sockets instead of boost::asio library. (I know its new code addition, but you could add it in existing under some compile time macro switch), But this is as a test to localize your problem if its your asio socket api calls which are the cause. I am reasonable sure TCP_NODELAY is not going to solve your issue here or help in anyway. Goodluck and keep posted about you progress. – goldenmean May 30 '13 at 13:25