0

How can a server running audio conference find who is the active speaker/speakers? I want to show an icon next to users who are currently speaking and show the video of the mos active speaker.

I think I need something like:

  • Calculate a score for each use based on audio energy/power/levels.
  • Normalize the score between all users.
  • Calculate the score on several audio frames to prevent rapid changes.

Do I need to do the calculation on the raw audio or is it possible to get the score from the encoded packets (speex/opus)? Is there a way to extract this info from the protocol transferring the audio (RTMP or SDP)?

In FreeSwitch there is a status field for each participant with flags for talking and floor owner. This is the code that calculates the score but I can't understand how it actually work.

Thanks

pablo
  • 2,719
  • 11
  • 49
  • 67

1 Answers1

0

Usually the participant who sends the loudest audio stream is considered the 'active speaker'.

So you have to calculate the volume for each audio stream. How to measure the volume depends on the encoding of your audio stream. Check out this question on how to calculate the volume for PCM audio.

Community
  • 1
  • 1
Gene Vincent
  • 5,237
  • 9
  • 50
  • 86
  • Do I have to decode the streams or is there info in the packet protocol (RTMP and WebRTC)? How can I normalize the audio of all users so comparison will be reasonable? – pablo Dec 21 '13 at 22:03
  • Unless you come up with a comparable measurement of different encodings, I would assume you do have to decode the streams to a common audio format. But the conferencing code would probably have to decode all audio streams anyway to mix them up for a common audio feed to send to each participant. – Gene Vincent Dec 21 '13 at 22:17
  • Note that for this approach, you'd want some kind of slight delay between switching who is considered active so that a short, loud noise on one speaker won't immediate switch to them. – Kat Aug 24 '15 at 22:30