2

I'm using speex to encode some audio data and send it over UDP, and decode it on the other side. I ran a few tests with speex, and noticed that if I decode a packet straight after I encoded it, the decoded data is in no way close to the original data. Most of the bytes at the start of the buffer are 0. So when I decode the audio sent over UDP, all I get is noise. This is how I am encoding the audio:

bool AudioEncoder::encode( float *raw, char *encoded_bits )
{
    for ( size_t i = 0; i < 256; i++ )
        this->_rfdata[i] = raw[i];
    speex_bits_reset(&this->_bits);
    speex_encode(this->_state, this->_rfdata, &this->_bits);
    int bytesWritten = speex_bits_write(&this->_bits, encoded_bits, 512);
    if (bytesWritten)
        return true;
    return false;
}

this is how I am decoding the audio:

float *f = new float[256];
// recvbuf is the buffer I pass to my recv function on the socket
speex_bits_read_from(&this->_bits, recvbuf, 512);
speex_decode(this->state, &this->_bits, f);

I've check out the docs, and most of my code comes from the example encoding/decoding sample from the speex website. I'm not sure what I'm missing here.

dotminic
  • 1,135
  • 2
  • 14
  • 28
  • speex is a lossy codec, the resulting stream will be different from original because you loose information in order to achieve better compression. – Paulo Scardine Nov 25 '10 at 18:23
  • 1
    @Paulo Scardine if I encode an array with the values from a sine wave, the 20~ first floats (once decoded) are all equal to 0. I know it's lossy, but there I'm losing most of the data. I also get some neg values where I had some positive values. – dotminic Nov 25 '10 at 18:33
  • seems like a signed/unsigned data type problem. – Paulo Scardine Nov 25 '10 at 19:24
  • what does it have to do with signed/unsigned data ? – dotminic Nov 25 '10 at 20:20

3 Answers3

1

Actually speaks introduces an additional delay to the audio data, I found out by reverse enginiering:

narrow band : delay = 200 - framesize + lookahead = 200 - 160 +  40 =  80 samples 

wide band   : delay = 400 - framesize + lookahead = 400 - 320 + 143 = 223 samples

uwide band  : delay = 800 - framesize + lookahead = 800 - 640 + 349 = 509 samples

Since the lookahead is initialized with zereos, you observe the first few samples to be "close to zero".

To get the timing right, you must skip those samples before you get the actual audio data you have feeded into the codec. Why that is, I dont know. Probalby the author of speex never cared about this since speex is for streaming, not primarily for storing and restoring audio data. Another workaround (to not waste space) is, you feed (framesize-delay) zeroes into the codec, before feeding your actual audio data, and then dropping the entire first speex-frame.

I hope this clarifies everything. If someone familiar with Speex reads this, feel free to correct me if I am wrong.

EDIT: Actually, decoder and encoder have both a lookahead time. The actual formula for the delay is:

narrow band : delay = decoder_lh + encoder_lh =  40 +  40 =  80 samples 

wide band   : delay = decoder_lh + encoder_lh =  80 + 143 = 223 samples

uwide band  : delay = decoder_lh + encoder_lh = 160 + 349 = 509 samples
Thilo Köhler
  • 3,631
  • 2
  • 18
  • 10
1

I found the reason the encoded data was so different. There is the fact it's a lossy compression as Paulo Scardine said, and also that speex only works with 160 frames, so when getting data from portaudio to speex, it needs to be by "packets" of 160 frames.

dotminic
  • 1,135
  • 2
  • 14
  • 28
  • 1
    what do you mean by 160 frames? a frame should include 160 short or 320 bytes right? – guness Jan 26 '13 at 03:11
  • Framesize always refers to the decoded data frame size in samples. Framesize is dependent on the encoding mode Narrowband (8kHz): framesize = 160 samples = 320 bytes of PCM Wideband (16kHz): framesize = 320 samples = 640 bytes of PCM Ultra Wideband (32kHz): framesize = 640 samples = 1280 bytes of PCM – Paulo Fidalgo Apr 18 '13 at 15:15
0

You may want to have a look here for some simple encoding/decoding: http://www.speex.org/docs/manual/speex-manual/node13.html#SECTION001310000000000000000

Since you are using UDP you may also work with a jitter buffer to re-order packets and stuff.

dmck
  • 11
  • 4
  • I've seen that link, and no, not using a jitter buffer yet, since I can't even get one packet encoded/sent/decoded/played correctly yet. I'm pulling my hair out over this! – dotminic Nov 25 '10 at 20:43