0

In my impression, when talking about improving IPC performance or lower the latency involved, context switch seems to be the most important factor. But I was always wondering why I've never heard that the number of runnable processes is also a factor?

If I understand correctly, context switch can be performed by hardware for most of modern CPUs, which should greatly reduce the time wasted to do that. On the other hand, CPU time is shared by all the runnable processes in the system. The more runnable processed are there in the system, the less frequent a process gets a chance to gain CPU control. (Although in general most of processes should be in sleep state for most of time, but it's just an unreasonable assumption of the system which cannot apply to every possible case I think.)

Suppose, for example, a system is configured to have round-robin scheduler, 1ms of time slice and 7 runnable processes with the same priority as follows:

    P1 P2 P3 P4 P5 P6 P7

By definition of round-robin scheduling, the context switch order should be:

    P1 -> P2 -> P3 -> P4 -> P5 -> P6 -> P7 -> P1 -> P2 -> ... -> P6 -> P7 -> P1 -> P2 -> ... -> P7 -> P1 -> ... and so on

Due to the context switch order above, if P1 tries to send out a message via some IPC mechanism to P5, the message will be handled by P5 after 3ms later. That's because P5 needs to wait for P2, P3 and P4 having consumed their own time slice so P5 eventually gets scheduled to run and handle the message sent by P1. So the IPC latency is at least 3ms in the case, which is much larger compared to the time needed for a context switch (which is typically µs order of magnitude)! If P5 wants to give a reply regarding to the message P1 have sent, another 2ms is wasted because P6 and P7 have to finish their turn beforehand. And the round-trip delay time (https://en.wikipedia.org/wiki/Round-trip_delay_time) should be: 3ms + 2ms = 5ms !

If the number of runnable processes is raised as follows:

    P1 P2 P3 ... P99 P100

the IPC latency for a message sending from P13 to P57 will be: (57 - 13 - 1)ms = 43ms

So in conclusion ... Does the issue described above really exist? Will one take into account the number of runnable processes while testing or measuring performance for IPC? Or why number of runnable processes in the system doesn't matter in terms of IPC performance/latency?

Justin
  • 169
  • 5

1 Answers1

0

Tryout hackbench. Interesting to see the results. Though its benchmarking the scheduler you can change the code to benchmark the IPC instead.

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's main job is to create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long it takes for each pair to send data back and forth.

http://www.makelinux.net/man/8/H/hackbench

If you want different kind of IPC than pipes and sockets, you can modify Hackbench source code.

Milind Dumbare
  • 3,104
  • 2
  • 19
  • 32
  • I've tried it and ran hackbench 1000 times. It tooks 0.012 second in the worst case. Command line: './hackbench 1 process 1 # Usage: hackbench [-pipe] [process|thread] [loops]' * 1000 times -> Worst-case output: 'Running with 1*40 (== 40) tasks. Time: 0.012' (the hackbench source is from http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c) – Justin Mar 05 '15 at 00:47
  • I don't know the exact logic used by hackbench for testing unless digging into the code. But I guess the target is the default CFS scheduler rather than round-robin, which should still be a good reference since the issue is supposed to exist in all kinds of scheduler that fulfill preemptive multitasking (http://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking). – Justin Mar 05 '15 at 01:15