Back in early days, processor's pipeline was simple and branch prediction was not as accurate as now.So delay slot----statically schedule instructions by compilers,seemly was an efficient choice.But nowadays, with technology like superscalar, OoO,advanced branch prediction,etc. delay slot turn to a burden due to binary-compatibility. So I want to know how does RISC like MIPS solve this problem in modern processors, like R10000 or something newer?
And I also want to know, does delay slot visible to cpu? or just execute the following instuction of a load or branch ?