3

I have an SFU media which just relays audio packets to everyone except the sender. Now for a large-scale conference, it is not a good idea to send all participant's audio packets to everyone if the participant count goes beyond 50-100 (let's assume everyone's mic is on).

Here, one workaround that came to my mind is that we can just relay the top n loudest (RMS value) audio packet and discard others. But this approach has some risk factors like:

  1. One participant can be removed from top n while he is still talking (by some noise/side-talk spike from others mic) which can destroy the audio experience.
  2. Frequent false switches can happen between top n members and the rest, which also can create chaos.
  3. Little noisy environment from some participant can stay at the top n all the time whether he talks or not.

How to make a decision in this kind of situation even if I consider perfect network conditions like I am getting all participant's audio packets at the same time with a fixed delay. How to select the perfect top n from a batch of participants' audio packets?

for some related info :

  1. Jitsi calculates DominantSpeaker (top speaker) through iterative power calculation over time. I mainly wanted to apply top n DominantSpeaker but don't know whether it is doable or not DominantSpeakerIdentification.java , Paper behind the algo
  2. Google-Meet forwards n audio streams from media (but not sure what algo they follow) - google meet forwarding top n audio analysis by red5pro
Nafiul Alam Fuji
  • 407
  • 7
  • 17

0 Answers0