1

I am trying to add support for conference chat in an already up and running single mic chat application (in which only one person can talk at a time). The streaming for both clients and everything is done and the voice is recording and playing well on both the computers that are using the mic but when a third person receives the packets then the audio is in a really weird way, I searched around and found out that I need to mix the two streams and then play them as one. I tried a few algorithms I found on the internet but I am not getting the result I need.

I am using speex as the encoder/decoder after decoding the incoming stream on the client side I tried mixing the two byte arrays/streams through the following algorithms.

Var Buffer1, Buffer2, MixedBuf: TIdBytes;
Begin
  For I := 0 To Length(Buffer1) - 1 Do Begin
    If Length(Buffer2) >= I Then
      MixedBuf[I] := (Buffer1[I] + Buffer2[I]) / 2
    Else
      MixedBuf[I] := Buffer1[I];
  End;
End;

The received buffer are either 492 or 462 bytes so I check if the Buffer2 is smaller than the Buffer1 then mix the first 462 bytes and leave the rest of the bytes unaltered and just add them to MixedBuff.

This algorithm when used have a lot of noise and distortion and only part of the voice can be heard.

Another algorithm which I found on here on stackoverflow submitted by Mark Heath is to first convert the bytes to floating point values.

Var Buffer1, Buffer2, MixedBuf: TIdBytes;
    samplef1, samplef2, Mixed: Extended;
Begin
  For I := 0 To Length(Buffer1) - 1 Do Begin
    If Length(Buffer2) >= I Then Begin
        samplef1 := Buffer1[I] / 65535;
        samplef2 := Buffer2[I] / 65535;
        Mixed := samplef1 + samplef2;
        if (Mixed > 1.0) Then Mixed := 1.0;
        if (Mixed < -1.0) Then Mixed := -1.0;

        MixedBuf[I] := Round(Mixed * 65535);
    End Else
      MixedBuf[I] := Buffer1[I];
  End;
End;

The value never goes below 0 but still I left the check for if the value goes below -1.0 as it was in the algorithm. This method works a lot better but still there is noise and distortion and the voice from the second stream is always really faint while the voice from the first stream is loud as its supposed to be, even if the first person is not talking the second voice is faint.

P.S: Oh and some details about the audio stream:

The details of the tWAVEFORMATEX record for the audio recording playback are as follows:

FWaveFormat.wFormatTag := WAVE_FORMAT_PCM;
FWaveFormat.nChannels := 1;
FWaveFormat.nSamplesPerSec := WAVESAMPLERATE; // i.e WAVESAMPLERATE = 16000
FWaveFormat.nAvgBytesPerSec := WAVESAMPLERATE*2;
FWaveFormat.nBlockAlign := 2;
FWaveFormat.wBitsPerSample := 16;
FWaveFormat.cbSize := SizeOf(tWAVEFORMATEX);

I hope I am providing all the information needed.

elixenide
  • 44,308
  • 16
  • 74
  • 100
Junaid Noor
  • 474
  • 9
  • 24
  • Are Buffer1 and Buffer2 byte arrays? If so you need to sign extend them into a 16 bit signed data type before the addition. – jaket Jul 02 '14 at 19:00
  • Are both streams the same sample rate and bit width? Do they have the same "DC offset" (like noise floor)? I'd probably try to normalize them first before mixing them. – David Schwartz Jul 02 '14 at 19:49
  • Yes they are byte arrays... how to sign them? any code? And yes both streams have the same sample rate and bits per sample, the FWaveFormat info apply to both the streams. – Junaid Noor Jul 02 '14 at 21:48

1 Answers1

4
FWaveFormat.wBitsPerSample := 16;

You need to respect the fact that your samples are 16 bits wide. Your code operates on 8 bits at a time. You could write it something like this:

function MixAudioStreams(const strm1, strm2: TBytes): TBytes;
// assumes 16 bit samples, single channel, common sample rate
var
  i: Integer;
  n1, n2, nRes: Integer;
  p1, p2, pRes: PSmallInt;
  samp1, samp2: Integer;
begin
  Assert(Length(strm1) mod 2 = 0);
  Assert(Length(strm2) mod 2 = 0);
  n1 := Length(strm1) div 2;
  n2 := Length(strm2) div 2;
  nRes := Max(n1, n2);
  SetLength(Result, nRes*2);
  p1 := PSmallInt(strm1);
  p2 := PSmallInt(strm2);
  pRes := PSmallInt(Result);
  for i := 0 to nRes-1 do begin
    if i < n1 then begin
      samp1 := p1^;
      inc(p1);
    end else begin
      samp1 := 0;
    end;
    if i < n2 then begin
      samp2 := p2^;
      inc(p2);
    end else begin
      samp2 := 0;
    end;
    pRes^ := EnsureRange(
      (samp1+samp2) div 2,
      low(pRes^),
      high(pRes^)
    );
    inc(pRes);
  end;
end;

Some people recommend scaling by sqrt(2) to maintain the combined power of the two signals. That would look like this:

pRes^ := EnsureRange(
  Round((samp1+samp2) / Sqrt(2.0)),
  low(pRes^),
  high(pRes^)
);
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • OK, I fixed the access violation. That was a silly missing div 2. – David Heffernan Jul 03 '14 at 08:24
  • 1
    As for the clicking I'm not sure. I suspect that you need to do some more advanced signal processing. – David Heffernan Jul 03 '14 at 08:25
  • Both the streams play alot better and the sound is of the same volume now but there is a continuous click sound and after a while it start generating Access violation error on line `samp1 := p1^;` but if i change `for i := 0 to nRes-1 do begin` to `for i := 0 to (nRes-1) div 2 do begin` as it is reading two bytes everytime the loop runs, then the click and Access violation issue goes away, but then both the voices gets too faint and mostly the second voice is close to unhearable if first person is talking even really slow – Junaid Noor Jul 03 '14 at 09:24
  • Lolz i figured it, the page wasnt refreshed, i tried editing the first comment but it wasnt letting me so i deleted it and then saw your comment :D... now the clicking issue is also fixed but the voice being too faint issue is reintroduced – Junaid Noor Jul 03 '14 at 09:26
  • Perhaps you need to normalize the streams in some way. Without knowing more details about the contents of the streams it's hard for me to say more. All I've really tried to do here is show how to combine the two streams, and process the data correctly. Your code obviously had that big problem with being byte oriented. But I think you'll need to do some more detailed signal processing. – David Heffernan Jul 03 '14 at 09:33
  • Thanks. i will search around for a more detailed signal processing algorithm, you solved the first hurdle :) – Junaid Noor Jul 03 '14 at 09:48
  • Can you tell me how to play this strem? – user2200585 Mar 18 '16 at 16:24