1

When I write a simple application, running for 10 minutes, that starts 10 threads once (pthreads), each sleeping for 1 ms in a loop (not doing anything else) the CPU is used ca. 44% (top reports that). It is a ARM9 CPU with 450 MHz, Linux 2.6.37 is used as OS. There is no other program running, it tried out different kernel configs (Dynamic Ticks, Soft/Hard IRQ, High Resolution Timer, ..., ..., ...), different priorities (up to 99) but the numbers stay the same. /usr/bin/time -v shows ca. 5'200'000 voluntary context switches and ca. 3 minutes are spent in Kernel space. Sleepin in each thread for ca. 5 ms and the CPU utilization goes down to ca. 9% which is IMO still crazy (40'500'000 cycles to safe some registers). clock_nanosleep was used for sleeping (CLOCK_REALTIME/CLOCK_MONOTONIC did not change anything).

I'm aware that a full context switch is expensive on ARM9 because caches have to be cleared. But a simple thread switch, or switch to the OS shouldn't be that expensive IMHO (address space remains the same, no cache/TLB flushing required). Is this common or should I try to find the bottleneck in the kernel?

azraiyl
  • 343
  • 2
  • 4
  • 11

1 Answers1

1

You're busily waking up and going back to sleep at 100uS intervals -- 10 threads, 1ms, that's 100uS on average. And keep in mind that you have two context switches for each of those 100uS intervals, so you have a context switch every 50uS on average, or 20,000 times per second.

Perhaps that's the answer you're looking for?

Julie in Austin
  • 966
  • 5
  • 21
  • Within a timeframe of 1 ms 10 threads should be running, therefore 10 context switches are needed. As I described `/usr/bin/time` shows a number of context switches I expect. 100 us are 45000 cycles. 44% are ca. 20'000 cycles (results are almost identical if you take the 9% and 5 ms). 20'000 cycles for save/restore registers and choose another thread. I think there is something I'm missing here. – azraiyl Apr 15 '12 at 17:12
  • First off, scheduling a thread requires two context switches. The first is when thread 1 is stopped and the O/S gains control. The second is when thread 2 is dispatched. Context switches aren't "friendly" to a processor and result in massive disruption to the processor, its pipeline, cache controllers, etc. – Julie in Austin Apr 30 '12 at 07:05