In my impression, when talking about improving IPC performance or lower the latency involved, context switch seems to be the most important factor. But I was always wondering why I've never heard that the number of runnable processes is also a factor?
If I understand correctly, context switch can be performed by hardware for most of modern CPUs, which should greatly reduce the time wasted to do that. On the other hand, CPU time is shared by all the runnable processes in the system. The more runnable processed are there in the system, the less frequent a process gets a chance to gain CPU control. (Although in general most of processes should be in sleep state for most of time, but it's just an unreasonable assumption of the system which cannot apply to every possible case I think.)
Suppose, for example, a system is configured to have round-robin scheduler, 1ms of time slice and 7 runnable processes with the same priority as follows:
P1 P2 P3 P4 P5 P6 P7
By definition of round-robin scheduling, the context switch order should be:
P1 -> P2 -> P3 -> P4 -> P5 -> P6 -> P7 -> P1 -> P2 -> ... -> P6 -> P7 -> P1 -> P2 -> ... -> P7 -> P1 -> ... and so on
Due to the context switch order above, if P1 tries to send out a message via some IPC mechanism to P5, the message will be handled by P5 after 3ms later. That's because P5 needs to wait for P2, P3 and P4 having consumed their own time slice so P5 eventually gets scheduled to run and handle the message sent by P1. So the IPC latency is at least 3ms in the case, which is much larger compared to the time needed for a context switch (which is typically µs order of magnitude)! If P5 wants to give a reply regarding to the message P1 have sent, another 2ms is wasted because P6 and P7 have to finish their turn beforehand. And the round-trip delay time (https://en.wikipedia.org/wiki/Round-trip_delay_time) should be: 3ms + 2ms = 5ms !
If the number of runnable processes is raised as follows:
P1 P2 P3 ... P99 P100
the IPC latency for a message sending from P13 to P57 will be: (57 - 13 - 1)ms = 43ms
So in conclusion ... Does the issue described above really exist? Will one take into account the number of runnable processes while testing or measuring performance for IPC? Or why number of runnable processes in the system doesn't matter in terms of IPC performance/latency?