NOPs are literally processor instructions that would have to be inserted by a compiler or assembly language programmer into the instruction sequence when the program is constructed.
NOPs need only be inserted if you remove forwarding and stalling hardware, so this question doesn't really apply to the real hardware, perhaps to some hypothetical hardware instead.
Stalls, on the other hand are a valid approach for a processor implementation to mitigate hazards like read after write (RAW) — sometimes strictly necessary and other times a poor alternative to forwarding.
Typically, between an instruction that targets a register, like add
, and an instruction that uses that result, either two or three stall cycles would be necessary without forwarding in order for the second instruction to read the proper data.
Whether 2 or 3 cycles depends on the internal implementation, and here is the question: can the ID stage read of the registers overlapping with the WB stage writing the registers — can it read the values being written in the same cycle. If the answer is yes, than 2 cycle stall is all that is required, but if the answer is no, then 3 cycle stall is required to fully complete the WB cycle writing to the register file, and allow a subsequent ID read to read those same results out of the register file.
Most MIPS implementations will describe that the WB stage is simple and can complete in the first half of the cycle, and also that the ID stage is simple and can complete in the second half of the same cycle. This means that for a WB write and an ID read that occur in the same cycle, the ID read will be able to see values written by the WB write. Since the overlap works toward the desirable effect, then only 2 stall cycles are necessary.
Why is the WB stage simple? Because there is nothing to compute (no addition/subtraction, no lookup) and the value needed to write is available at the very beginning of the cycle. Because of this the value goes into the register virtually immediately, and the ID read of the register file will pick up newly written values by mere settling of the circuitry. So, that first half/second half is not even necessarily strictly required.