0
mov $10, %eax
add $2, %eax
mov $4, %ebx
mov $5, %ecx
add $1, %ebx
add $1, %ecx
add %ecx, %eax
add %ebx, %eax

If you have the above assembly, the general 5-stage pipeline would look something like the below but since there is a data dependence, the first instruction won't store the result until stage 5 and therefore the second instruction can't start fetching the results. How would you wait for the cycle to finish by inserting NOP instructions?

Instruction Cycle 1  2  3   4   5   6   7
mov $10, %eax     IF ID EX  MEM WB      
add $2, %eax         IF ID  EX  MEM WB  
mov $4, %ebx            IF  ID  EX  MEM WB

Edit Not sure if this right but here is what I came up with:

                1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20
mov $10, %eax   F   D   E   M   W
NOP                 F   D   E   M   W
NOP                     F   D   E   M   W
NOP                         F   D   E   M   W
add $2, %eax                    F   D   E   M   W
mov $4, %ebx                        F   D   E   M   W
mov $5, %ecx                            F   D   E   M   W
NOP                                         F   D   E   M   W
NOP                                             F   D   E   M   W
add $1, %ebx                                        F   D   E   M   W
add $1, %ecx                                            F   D   E   M   W
NOP                                                         F   D   E   M   W
NOP                                                             F   D   E   M   W
NOP                                                                 F   D   E   M   W
add %ecx, %eax                                                          F   D   E   M   W
add %ebx, %eax                                                              F   D   E   M   W
blor
  • 25
  • 5
  • On the `x86` architecture seemingly used here such dependencies are detected and stalls inserted where necessary by the processor itself, there is no need for the programmer or compiler to insert explicit `NOP`. Modern out-of-order implementations will in fact look further to find additional work to insert instead of stalling. Are you asking how the logic for such detection would be implemented in a traditional 5-stage pipeline processor implementation or how a compiler targeting an architectures requiring manual pipeline tracking (as in delay-slots) would detect the necessity for stalls? – doynax Apr 14 '17 at 06:30
  • I was asking where would you insert the NOP instructions in order to accommodate for the data delay. – blor Apr 14 '17 at 07:42
  • I take it that the processor you are targeting lacks dependency tracking and expects compiler-aid in scheduling? In the only type of dependencies in instruction given sequence is between the execution and write-back stages, and so you will need two stalls if the result is used in the very next instructions and one stall if used two instructions ahead. Regardless you shouldn't insert NOPs unless you really have to. In this case you can mostly get rid of the bubbles by reordering the sequence instead. – doynax Apr 14 '17 at 07:59
  • In any event you will need to take care. There is no such thing as a standardized 5-stage pipeline, only a rough classic 5-stage RISC pipeline, and the details can and do vary from machine to machine. Be sure to review your architecture documentation carefully before proceeding, and most importantly run experiments in the debugger to see what the results are. – doynax Apr 14 '17 at 08:09

0 Answers0