Instruction Level Parallelism (ILP) Methods

Question

I'm trying to learn about the methods used in instruction level parallelism and the differences between them. My question here is, given an instruction set that was initially made to run at a processor without instruction level parallelism, which one of these methods can be used in order to achieve instruction level parallelism on a new processor and why/how. The new processor will execute the same instruction set and run the same program binaries identical to the original one, but the performance will be better. The options are:

1)Out-of-order execution(Tomasulo Algorithm)

2)Pipelining

3)Superscalar

4)VLIW

score 0 · Accepted Answer · answered Apr 15 '16 at 03:31

I would say OOO will be the first thing that will highly increase ILP. OOO architectures are hardware techniques that are totally independent of the workings of compilers (meaning that OOO architecture will carry out the same computations of a CPU without OOO and producing the same results with less time with no change to the instructions structure at all)

Pipe-lining is a well known and old technique to increase ILP but it has its limitations, adding stages increase hardware complexity and eventually will give a diminishing returns.

VLIW and superscalar are essentially the same but they are different style of parallelism, they require special hardware and special compilers, so they are not compatible with the conventional control-flow architecture. This technique essentially rely on compilers to pack more than instruction in one Very Long Instruction Word (VLIW) that can be executed in parallel.

Thanks for the answer. What you have told about OOO,pipelining and VLIW pretty much confirms my answer. The only difference from my answer to yours is superscalar. It seems like the basic idea of superscalar architecture is not to rely on any special compilers and allow the hardware alone to detect which instructions can be run in parallel. That's why I thought superscalar would be able to run the same program binaries in a faster and more efficient fashion since it is only hardware dependent and not compiler dependent. — Mert Şeker, Apr 15 '16 at 08:02
If you mean by superscalar CPU is any CPU that deliver IPC > 1 (instructions per cycle), then yes. Depending on the hardware, The execution fashion could be independent of the compiler work. — Udai F.mHd, Apr 15 '16 at 13:41

Olsonist · Answer 2 · 2020-01-13T21:40:00.207

Start with pipelining. This is the oldest and best approach at achieving ILP through overlapping fetch, decoding, execution, ... of multiple instructions. It is so common that any real CPU which has OOO, in-order, superscalar, VLIW, ... to achieve ILP will also be pipelined.

Yes, OOO will achieve ILP. The first and third instructions below can execute OOO in parallel while the second must wait for the first to complete (RAW hazard on r1). The CPU scheduler will have to find the third instruction OOO dynamically.

ld  r1, 0(r2)
add r2, r1, r3
add r4, r3, r5

You didn't mention in-order but it can achieve ILP as well. The first and second instructions can execute in parallel but the third will have to wait for the first to complete since it also has a RAW hazard on r1.

ld  r1, 0(r2)
add r4, r3, r5
add r2, r1, r3

Superscalar and VLIW only exist for ILP. VLIW uses static compile time scheduling to achieve ILP. Superscalar uses execution time scheduling by the CPU AND compile time scheduling by the compiler to achieve ILP.

Instruction Level Parallelism (ILP) Methods

2 Answers2