Background
I'm developing a progressive download media streaming server in C++ using Boost. A typical configuration is an Android rendering device running Android 4.2.2, using a stock Gallery player as the media player, and the media streaming server running on a Windows desktop. The Android device requests a media file via an HTTP URL, with the media server streams the file using progressive download.
The Problem
When attempting to stream a video file with an internal bit rate of 20 Mbps, the renderer consistently stalls numerous times. A typical streaming experience consists essentially of multiple occurences of the steps:
- Render smoothly for 3-5 seconds
- Video stalls for 5-10 seconds
- Go to step 1
The most logical explanation is that the renderer is experiencing "buffer underrun" or "buffer underflow"
The Question
Is there any way to solve the buffer underrun problem, improving the media streamers output rate, and preventing visual stalling/blocking?
Technical Information
The server code looks like this:
void StreamFile (boost::asio::ip::tcp::socket *socket, const wchar_t *path)
{
. . .
for (long offset=startOffset; offset <= endOffset; offset+=streamingBlockSize)
{
long numBytesToRead = (std::min<long>) (endOffset - offset + 1, streamingBlockSize);
fread (buffer, 1, numBytesToRead, f);
if (RawSocketWrite (socket, buffer, numBytesToRead) == 0)
{
// RawSocketWrite() encountered a serious error, exit
break;
}
}
. . .
}
size_t RawSocketWrite (boost::asio::ip::tcp::socket *socket, const char *data, size_t len)
{
size_t numCharsWritten = 0;
try
{
numCharsWritten = boost::asio::write (socket, boost::asio::buffer (data, len));
}
catch (boost::system::system_error& e)
{
LOG_ERROR (("error", "write() failed in RawSocketWrite (socket %d) %s", socket->native (), e.what()));
numCharsWritten = 0;
}
return numCharsWritten;
}
I'm trying to stream a 39 MB, 16 second video file with the following file data (courtesy of MediaInfo):
Video information
General
Complete name : TestVideo.mp4
Format : MPEG-4
Format profile : Base Media
Codec ID : isom
File size : 38.6 MiB
Duration : 16s 102ms
Overall bit rate : 20.1 Mbps
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.0
Format settings, CABAC : Yes
Format settings, ReFrames : 1 frame
Format settings, GOP : M=1, N=61
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 15s 701ms
Bit rate : 20.0 Mbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Variable
Frame rate : 30.000 fps
Minimum frame rate : 29.732 fps
Maximum frame rate : 30.313 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.322
Stream size : 37.5 MiB (97%)
Title : VideoHandle
Language : English
mdhd_Duration : 15701
Audio
ID : 2
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : 40
Duration : 16s 102ms
Source duration : 16s 131ms
Bit rate mode : Constant
Bit rate : 192 Kbps
Nominal bit rate : 96.0 Kbps
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 KHz
Compression mode : Lossy
Stream size : 374 KiB (1%)
Source stream size : 375 KiB (1%)
Title : SoundHandle
Language : English
mdhd_Duration : 16102
The StreamFile() function streams blocks of 'streamingBlockSize' bytes in a tight loop to the output socket ('streamingBlockSize' is set through a configuration file, and was introduced while researching and debugging the current problem of buffer underrun).
Tracking the packets using Wireshark shows packets with 1448 bytes of streaming data being sent at an even pace:
|Time | 192.168.0.197 |
| | | 192.168.0.199 |
|14.420722000| SYN, ACK | |Seq = 0 Ack = 1| |(10243) ------------------> (58358) |
|14.437750000| PSH, ACK - Len: 266 |Seq = 1 Ack = 188| |(10243) ------------------> (58358) |
|14.437924000| ACK - Len: 1448 |Seq = 267 Ack = 188| |(10243) ------------------> (58358) |
|14.437939000| ACK - Len: 1448 |Seq = 1715 Ack = 188| |(10243) ------------------> (58358) |
|14.437950000| ACK - Len: 1448 |Seq = 3163 Ack = 188| |(10243) ------------------> (58358) |
|14.442016000| ACK - Len: 1448 |Seq = 4611 Ack = 188| |(10243) ------------------> (58358) |
|14.444269000| ACK - Len: 1448 |Seq = 6059 Ack = 188| |(10243) ------------------> (58358) |
|14.444293000| ACK - Len: 1448 |Seq = 7507 Ack = 188| |(10243) ------------------> (58358) |
|14.444358000| ACK - Len: 1448 |Seq = 8955 Ack = 188| |(10243) ------------------> (58358) |
|14.444373000| ACK - Len: 1448 |Seq = 10403 Ack = 188| |(10243) ------------------> (58358) |
|14.444389000| ACK - Len: 1448 |Seq = 11851 Ack = 188| |(10243) ------------------> (58358) |
. . .
|72.768739000| ACK - Len: 1448 |Seq = 40488067 Ack = 188| |(10243) ------------------> (58358) |
|72.768766000| ACK - Len: 1448 |Seq = 40489515 Ack = 188| |(10243) ------------------> (58358) |
|72.772484000| ACK - Len: 1448 |Seq = 40490963 Ack = 188| |(10243) ------------------> (58358) |
|72.772521000| PSH, ACK - Len: 895 |Seq = 40492411 Ack = 188| |(10243) ------------------> (58358)
Wireshark provides very useful summary information about the packets above via the Statistics>Summary menu item:
Packets 27997
Between first and last packet 58.352 sec
Avg. packets/sec 479.797
Avg. packet size 1513.586 bytes
Bytes 42375867
Avg. bytes/sec 726213.548
Avg. MBit/sec 5.810
This tells us it took 58.352 seconds to transfer a 39 MB video that has a playing time of 16.102 seconds and whose renderer encounters frequent stalling. This sounds like a classic case of buffer underrun.
In addition, the average Mbps rate detected by Wireshark was 5.81 Mbps. By definition this can never satisfy a renderer that needs to render a video at a bit rate of 20.1 Mbps.
Possible Fixes
While researching the problem I've come across numerous technical issues that might contribute to the problem, and would appreciate your thoughts.
Increase size of buffer passed to write()
I've tried varying the amount of bytes passed to the write() function (e.g., 4096, 8192, 16384) to see if increasing the data size can speed up the transfer. It does not seem to make a difference (see discussion of MTU and MSS for possible explanation).
Increase Ethernet MTU (Maximum Transmission Unit) and/or TCP MSS (Maximum Segment Size)
Wireshark shows each TCP packet carries 1448 of the video raw data. Would increasing the MTU or MSS improve streaming throughput? http://www.stratus.com/blog/openvos/?p=1459 has an interesting comparison between MTU and MSS.
TCP_NODELAY
There are several pages discussing the socket setting TCP_NODELAY (see http://en.wikipedia.org/wiki/Transmission_Control_Protocol). My understanding is that it will improve multiple file transfer, which typically results in the last bytes of a file not filling the output buffer. By default TCP will wait 200 ms for the buffer to fill up. With TCP_NODELAY there will be no delay. In a single video file streaming situation I would not expect an improvement. Is this correct?
Network load variability
Could the network being used cause the data to stream too slowly?
boost::asio::write() is a blocking write -- would a non-blocking write help?
The boost::asio::write() at the very bottom is a blocking write:
try
{
numCharsWritten = boost::asio::write (socket, boost::asio::buffer (data, len));
}
Is there maybe an intrinsic delay when using a blocking write() as opposed to a non-blocking write()? Would using a non-blocking write improve throughput?
Many thanks in advance for your help.