Top n loudest audio packet forwarding from media in a conference

Question

I have an SFU media which just relays audio packets to everyone except the sender. Now for a large-scale conference, it is not a good idea to send all participant's audio packets to everyone if the participant count goes beyond 50-100 (let's assume everyone's mic is on).

Here, one workaround that came to my mind is that we can just relay the top n loudest (RMS value) audio packet and discard others. But this approach has some risk factors like:

One participant can be removed from top n while he is still talking (by some noise/side-talk spike from others mic) which can destroy the audio experience.
Frequent false switches can happen between top n members and the rest, which also can create chaos.
Little noisy environment from some participant can stay at the top n all the time whether he talks or not.

How to make a decision in this kind of situation even if I consider perfect network conditions like I am getting all participant's audio packets at the same time with a fixed delay. How to select the perfect top n from a batch of participants' audio packets?

for some related info :

Jitsi calculates DominantSpeaker (top speaker) through iterative power calculation over time. I mainly wanted to apply top n DominantSpeaker but don't know whether it is doable or not DominantSpeakerIdentification.java , Paper behind the algo
Google-Meet forwards n audio streams from media (but not sure what algo they follow) - google meet forwarding top n audio analysis by red5pro

Top n loudest audio packet forwarding from media in a conference

0 Answers0