How can a server running audio conference find who is the active speaker/speakers? I want to show an icon next to users who are currently speaking and show the video of the mos active speaker.
I think I need something like:
- Calculate a score for each use based on audio energy/power/levels.
- Normalize the score between all users.
- Calculate the score on several audio frames to prevent rapid changes.
Do I need to do the calculation on the raw audio or is it possible to get the score from the encoded packets (speex/opus)? Is there a way to extract this info from the protocol transferring the audio (RTMP or SDP)?
In FreeSwitch there is a status field for each participant with flags for talking and floor owner. This is the code that calculates the score but I can't understand how it actually work.
Thanks