In a SFU audio conference platform, media server simply route audio packets. Lets say in client side I keep audio packet queue for each present participant (updated by signaling server) and at a certain rate I simply dequeue from every queue, handle, pick top 4-6 voice packets and mix for play. If sequence number is missing for some participants I even send nack and wait for some threshold time for that participants queue to be dequeued (to maintain the voice flow).
But to make this solution scalable, I have to do this dequeue then pick top 4-6 voice from media server side and send it to every one. Now, from client side, even if some participant's packet sequence gets missing I am not sure whether it was actually missing or it was not able to make it to top 4-6 voice packets in server (as I need to send nNack and wait if packet actually got missing).
How can I handle this usecase efficiently and any suggestion with top mixing numbers or anything is highly appreciable?