0

In section 2.1 of the Speex codec manual it says:

Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of “look-ahead” required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don’t account for the CPU time it takes to encode or decode the frames.

In RTP Payload Format for the Speex Codec, RFC5574 it says:

ptime: SHOULD be a multiple of 20 msec

I have a 20mS frame time of encoded data. so I assume my ptime should be 20.

The delay for the encoding is 30mS or more. The time between RTP packets are 20mS. How is this supposed to work? Every other RTP payload is an empty packet? How do I resolve this?

Seemingly this is an issue with every codec. I must be missing some fundamental understanding of how streaming works.

I have validated I can stream a pre-encoded buffer and it sounds as intended.

I have tried:

  • Creating a large queue in the beginning to compensate, however this quickly becomes zero length.
  • Sending zero data as the payload

Ideas I haven't yet tried:

  • Send a packet of all padding and mark the RTP header as padding
  • Increase the sequence but not the timestamp until the next actual payload is ready (this sounds like it is against the spec?)

Note: I'm now wondering if the delay mentioned by speex is within the encoded output and the delay I am seeing while streaming is due to my limited CPU (embedded)

Community
  • 1
  • 1
Michael
  • 309
  • 2
  • 3
  • 16

1 Answers1

1

My note was correct. This question is flawed.

The Speex manual is referring to a delay in the audio output, not an inherent delay of processing time. Therefore the issue in question is not an issue.

I'm glad I asked the question, it helped me come to the solution.

Michael
  • 309
  • 2
  • 3
  • 16