5

I'm concerned with the design of an adaptive jitter buffer that increases and decreases capacity with increases and decreases in the jitter calculation.

I see no reason to make any adjustments to latency or capacity unless there is a buffer underrun which might then be followed by a burst of incoming packets that exceeds capacity (assuming buffer capacity equals buffer depth/latency in the first place). As an example, if I'm receiving 20ms packets, I might well implement a buffer that is 100ms deep and therefore has capacity for 5 packets. If 160ms passes between packets, then I might be expecting to see as many as 8 packets come in nearly all at once. I have two choices at this point:

  1. drop three of the packets according to the rules of overflow
  2. drop no packets and increase buffer capacity as well as latency

Assume choice 2 and that network conditions improve and packet delivery becomes regular again (the jitter value drops). Now what? Again, I think I have two choices:

  1. do nothing and live with the increased latency
  2. reduce latency (and capacity)

With an adaptive buffer, I think I'm supposed to make choice 4, but that doesn't seem right because it requires that I artificially/arbitrarily drop packets of audio that were specifically saved when I took choice 2 upon encountering the greater jitter in the first place.

It seems to me that the correct course of action is to initially take choice #1 to maintain latency while dropping packets, if necessary, that are delivered late due to increased jitter.

A similar scenario might be that instead of getting a burst of 8 packets after the 160ms gap, I only get 5 (perhaps 3 packets were just lost). In that case, having increased the buffer capacity doesn't do much of any good but does serve to reduce the potential of overflow later on. But if the idea of overflow is something to be avoided (from the network side), then I would simply make buffer capacity some fixed amount greater than the configured 'depth/latency' in the first place. In other words, if overflow is not something caused by the local application failing to get packets out of the buffer in a timely manner, then overflow can only happen for two reasons: either the sender lies and sends packets at a faster rate than agreed upon (or sends packets from the future), or, there is a gap between packet bursts that exceeds my buffer depth.

Clearly, the whole point of the 'adaptive' buffer would be to recognize the latter condition, increase buffer capacity, and avoid dropping any packets. But that brings me right to the stated problem: how do I 'adapt' back to the ideal settings when network jitter clears up while still enforcing the same 'drop no packets' philosophy?

Thoughts?

alpartis
  • 1,086
  • 14
  • 29
  • Check the jitter buffer implemented in asterisk project: http://www.voip-info.org/wiki/view/Asterisk+new+jitterbuffer. It is adaptive. – ItsMe May 05 '15 at 21:37

1 Answers1

1

With companding. When jitter clears up, you merge packets and 'accelerate' the buffer. Merge offcourse will need appropriate handling, but the idea is to pop 2 20ms packets from ajb and creating a 30ms packet. you keep doing this until your buffer levels are normal.

Similarly for underrun, packets can be 'stretched' in addition to introduction of latency.

tan
  • 116
  • 7
  • By definition, underrun means there are no packets in the buffer. So which packets get stretced? I'm not sure that your suggestion makes sense. I've already got a gap in the audio playback once the underrun is encountered. Stretching out the packets that might burst in after the underrun seems more likely to cause unnecessary overflow. – alpartis Nov 25 '14 at 06:10
  • don't wait for underrun to occur. when your buffer reaches some low water mark, start stretching. – tan Nov 25 '14 at 06:37
  • I disagree with your suggestion. If there are still packets in the buffer, there is no reason to start distorting playback and increasing latency just because a couple packets are late. If those packets eventually arrive, still in time to be played in the correct timeslot, then audio playback has to be distorted a second time to reduce latency that was just artificially introduced. In this scenario there were 2 distortions introduced by the playback system quite unnecessarily so i.e. there was no need for an adaptive algorithm to kick in at this point. – alpartis Nov 25 '14 at 16:29
  • 1
    For an adaptive jb, I would want to increase and decrease latency based on jitter. I will increase latency when there is high jitter or underrun. Ideally I don't want gaps when I introduce latency. So, artificial frames can be generated using previously decoded frames or current frames can be expanded. Once I have increased the latency, the frames can be popped out at normal rate. Now, if buffer is full after some time, accelerate is better idea instead of dropping packets. If the network is very bursty then ajb should identify this condition and keep the latency and buffer size high. – tan Nov 26 '14 at 08:36