Minimal set of Assembly Instructions for an Intermediate Language?

Question

I was wondering the following:

Is it possible to create a small set of Assembly Instructions that together can do all operations possible? Or maybe asked differently what are the Must-Have assembly instructions for about any architecture?

(For example, Jump and Add would be necessary to do about anything)

I hope you guys can help me!

To provide some background information: I am trying to design an Intermediate Language for my compiler and I'd like to use as few instructions as possible (where then later a bunch of those instructions could be substituted for one Complex Instruction for specific architectures). But of course the IL itself should be portable.

There certainly is a concept of "minimum required set". Many CPUs have been designed around the concept of "microcode" in which the machine instructions you provide are actually executed as "microcode" routines inside the CPU. The microcode is a more fundamental and smaller set of operations. It's really up to you how you wish to define it, as long as you get the functionality you require. As an intermediate language for a compiler, many languages have an intermediate "pcode". "pcode" is not normally down at the assembly level, however. — lurker, Jan 25 '14 at 15:59
Mmm could you point me in a right direction on how to design your own microcode | Intermediate Language? Thanks for your answer! I am glad my assumptions weren't that crazy this time:P — , Jan 25 '14 at 16:03
There is a reason why stack based intermediate code has been used over and over again by compilers for a portable bytecode that can be easily targeted to any system. — old_timer, Jan 25 '14 at 16:48
You [only need to implement `mov`](https://www.google.co.uk/url?q=http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf&sa=U&ei=BPHjUv2APIix0AWc4IDoAw&ved=0CB4QFjAA&usg=AFQjCNHKC523ShdjmBLDTCHHsuqd1P06RA). — Kerrek SB, Jan 25 '14 at 17:15

score 3 · Answer 1 · answered Jan 25 '14 at 16:11

I think you want the opposite. Instead of making an IL that is as simple as possible, you want one that is very expressive. The more expressive the IL is, the easier it may be to optimize for a specific architecture.

It is easier to expand a complex IL operation into many individual instructions than it is to coalesce many simple IL operations into a complicated instruction. You might not need multiply, since it can be done with jump and add instructions. But when you're compiling for a chip which has a hardware multiply, you'd have to analyze the IL to determine this was an "add loop" and covert it back into a multiply. That's a lot more work than coming across a multiply and saying "hmm, this architecture can't do that, I guess we'll have to make it an add loop."

Another example, you might think your IL doesn't need floating point operations, since some ARM chips have to do floating point in software anyways. But some ARM chips don't have to do that, and if your IL doesn't support FP operations then you'll need to convert complicated software FP IL back into a single hardware instruction.

It's better to match your IL to the most advanced and complicated hardware features, and then "fall back" to "software emulation" of those features on processors that don't have them.

There is a trick to keep you IL minimal while being able to cater for the target-specific capabilities (such as FPU, SIMD, etc.) - intrinsic functions. — SK-logic, Jan 25 '14 at 16:16
Mmm Thanks for responding! Just one more thing: Isn't it way easier to optimize ILs if the set of instructions used is minimal? — , Jan 25 '14 at 16:22
@ChristianVeenman, a number of instruction is not that important. Much more important is an intermediate language design. Certain properties, such as SSA, are very useful for optimisations. Register machines are easier to optimise than stack machines (but the latter are much easier to target). Having explicit basic blocks makes all the optimisations easier, but, again, harder to generate code this way, maybe you'll need another intermediate language in between. Intrinsics are easier to deal with, but they can make inst. combine optimisations less transparent (see LLVM). — SK-logic, Jan 25 '14 at 20:47

score 2 · Accepted Answer · answered Jan 25 '14 at 16:23

The minimum is one instruction and it was even implemented in reality in the carbone nanotube computer or the MAXQ chip

Although only one is enough but in fact it's much more complicated than you thought, and often needs more instruction to do the same work. If you need the chip's speed to be "usable" then IMO it should have at least some common instructions:

1 conditional jump instruction: jump on equal (or not equal)
1 SUB instruction for arithmetics. This way you can do both addition and subtraction easily without a negate instruction
1 bitwise instruction: NAND (or NOR), with one of this you can do any logic operations needed
1 MOV instruction
1 load/store instruction

With sub or bitwise instruction you can do a move data so depend on your architecture and the opcode size you may remove MOV or load/store to simpify it even more.

Thanks a lot! I will accept your answer! But can I ask you one more question?: As the other comment suggest it's better to have a complex IL. But do you agree? Or do you think a minimal IL is better for optimizing? — , Jan 25 '14 at 20:24
For the IL then it's often complex since it will be translated to machine code anyway. Contrarily, machine instruction sets nowadays are often simple, it's easier to design and make each individual instruction fast this way. But you should only take care of this if the performance is important, and it's not easy to match a part of commercial products' performance with the same cost/hardware anyway since it requires much work and research, so the simpler the better — phuclv, Jan 26 '14 at 05:35

Minimal set of Assembly Instructions for an Intermediate Language?

2 Answers2