1

This is a purely conceptual question. Why don't OS's switch tasks whenever an branch that has never been taken occurs? Dynamic branch prediction only works with branches that have been taken in the past, and static branch prediction is only correct in certain scenarios. If you have no data on a branch, it seems like the OS and the processor should start putting a separate task into the pipeline rather than blindly guess the branch. Then you can compute the result of the branch, and execute that branch when the original task is scheduled again. Next time the branch is encountered, the processor can use dynamic prediction.

Is there a reason this method isn't used? Or is it used and I'm just unaware?

master565
  • 805
  • 1
  • 11
  • 27

2 Answers2

1

The overhead of context switching is so high compared to just executing the branch.

user3344003
  • 20,574
  • 3
  • 26
  • 62
  • I'm not too familiar with how a scheduler works. Does the overhead arise from having to configure the processor to the correct state to resume another task? – master565 Jun 15 '17 at 19:53
  • 1
    The processor has to save he complete state of the process (or thread), meaning all the registers that define the state.Then it has to load the registers for the new task. That is the equivalent of 20-30 move instructions. – user3344003 Jun 15 '17 at 20:22
  • That makes sense. I guess the only possible way it would be worth it is if you had an insanely large pipeline that had a larger flushing overhead than the context switching overhead. Thanks for the answer! – master565 Jun 15 '17 at 20:34
  • Plus a context change is going to flush caches on many systems. – user3344003 Jun 16 '17 at 14:03
  • plus a context switch would wait to clear the pipeline, which is... you know exactly what you were trying to avoid... – bolov Aug 19 '17 at 09:41
0

This is why SMT - simultanions multi-threading was invented, while one thread fumbles around with its branch mispredict the other threads on the same core can advance.

The new Power processor even has 8 hardware threads on each core to ensure maximum throughput in face of

  1. branch mis-predictions
  2. instruction cache misses
  3. data cache misses
  4. data dependencies
Surt
  • 15,501
  • 3
  • 23
  • 39