Can interpreted languages use delay slots?

Question

When dealing with a pipelined architecture for executing instructions, one of the ways to avoid hazards is to use delay slots, or a rule that prevents certain instructions from accessing values computed in the lines above them. My understanding is that the assembler attempts to move around your instructions that don't depend on each other so that the non-dependent instructions can be executed while the dependent instructions wait. Is this feature possible or does this occur in the case of interpreted languages that have no real compile time?

(Note that if anything I said above reflects a gap in my understanding please correct it, because these concepts are new to me).

Most modern CPUs rely on automatic instruction reordering and don't expose features like "delay slots" to the program anyway. — oakad, Dec 11 '13 at 06:39

score 1 · Answer 1 · answered Dec 11 '13 at 08:15

The assembler doesn't move anything, that's the domain of a compiler optimizations (at compile time), or Jitters at runtime if such exist (when dealing with java or other jitted languages).
An interpreter is often a far simpler construct, that's in charge of taking a single instruction at a time and executing it on some host system (doing along the way the translation from one architecture to the other, or from bytecode into machine code). It's theoretically possible to build an interpreter that can shuffle code but that's a bit redundant since jitted languages can recompile the entire code and get this reorder as part of it. It's also not very useful since baseline interpreted run mode is already very slow on the host CPU due to the overhead, that simple code shuffle tricks are hardly going to make a dent on the performance.

Also note that on modern HW, most of the simple reordering is pointless - an out-or-order execution engine will rearrange the code internally anyway so that each instruction can be executed once its data dependencies are resolved. For control dependency there are really good branch predictors in the market, so you hardly stall - you just speculate and flush in case you were wrong (which is worth it as the prediction accuracy could get around ~95% in most cases).

There's still important benefit for reordering, but it's not for eliminating bubbles, it's mostly for load hoisting, loop-invariant code motion and eliminating memory false dependencies that the HW can't reorder itself. However, this isn't a simple reorder you can do at interpretation time, you'll need actual compilation or jitting for that.

Niggle: The Stanford MIPS used an assembler to do limited instruction reordering (and packing). "assembly language instruction set, defines instructions that are unpacked and have no pipeline dependencies or branch delays." ("Design of a High Performance VLSI Processor", Hennessy et al., 1983, TR#236). The intention was to allow assembly-level compatibility and implementation-specific machine code while allowing changes in implementation. (This is mostly just a historical footnote.) — , Dec 11 '13 at 16:12
@PaulA.Clayton, there are many dataflow and VLIW architectures that allow you to perform your code with predetermined dependencies, so you may reorder the code as you see fit. I think most of them employ dedicated compilers that do most of that work for you. — Leeor, Dec 11 '13 at 18:55

score 0 · Answer 2 · answered Dec 11 '13 at 05:14

Think of the minecraft computer. It is, in effect, an interpreter: a program reading instructions and selecting which internal functions/routines to execute it's input directives in real-time rather than via compilation.

The interpreter itself - the minecraft program in this case - may be able to make use of cpu level tweaks, but the application - the redstone computer - can't.

One problem the redstone computer suffers is that it is very low level, the interpreter provides very few constructs for implementing a computer. As a result, the whole thing is very data-driven and there is minimal opportunity for the CPU to read ahead and optimize.

The higher level - So the more complex constructs you encode your interpreter for, the more it's programs will benefit from cpu tweaks.

But no, a purely interpreted language can't.

Can interpreted languages use delay slots?

2 Answers2