3

I am writing instructions for the ARMv7 processor. I would like to know why I am not allowed to have a constant value in the MUL instruction itself? You're allowed with the ADD and SUB instruction so why not MUL?

Cheers

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sam Rolls
  • 33
  • 1
  • 3

3 Answers3

7

If you look at the instruction encoding table of the ARM ISA you'll see that only data processing instructions support immediate operands (with an interesting encoding, by the way); they are the only ones to have a 12-bit operand2, all the other have at most the 4-bit fields for Rs and Rm (the space that remains in between is used to disambiguate with other instruction categories).

This decision probably comes from the fact that embedding immediates in multiplication instructions is not particularly interesting; if the multiplier is known, mul is often a bad choice, as there are generally faster sequences of add/shift/sub. Also, the cost of a multiplication instruction may be such that the gain in latency obtained from having the immediate straight in the instruction does not justify "stealing" a slot of the 16 available in the encoding for data processing instructions.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 2
    Modern x86 typically has insanely powerful integer multiply (e.g. 3 cycle latency for any operand size up to 64-bit). I assume ARM doesn't spend that many transistors, but how slow is it these days? *When the ARM ISA was designed*, multipliers were probably slower than now. I guess with ARM's built-in barrel shifter, it's very efficient to shift and add or sub a multiplier with a few set bits, but what's the break-even on a typical ARM CPU? On x86 it's worth using 2 LEA instructions instead of an `imul r32, r/m32, imm8` if throughput matters more than latency, but 3 LEA isn't worth it. – Peter Cordes Mar 13 '18 at 11:48
  • from the beginning of processors to today how fast a mul is is determined by how much chip real estate you want to burn, one clock, 2, 4, 32, etc. architecture(/age) is irrelevant, multiplication has beena around longer than electricity much less digital processors. If you look at the ARM TRMs you see that some of them have options for one or more clocks at compile time so as a chip designer you can choose the consumption yourself. With x86 you get what you get, intel picks. – old_timer Mar 13 '18 at 11:53
1

That's just how that processor architecture works. Different architectures have different requirements.

Typically, a choice like this is a compromise decision in order to optimize other operations/memory access/cache utilization. The tradeoff is usually far more beneficial than including that instruction on silicon.

David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
0

because arm uses somewhat fixed instruction lengths there arent that many bits left, there is a barrel shifter though so think scientific notation, you can have a cluster of bits shifted but not bits spread out (other than shifting around the top), so 0x00000099, 0x00099000, 0x09900000 are generally okay depending on the instruction/set, but 0x00900090 is likely not, certainly not 0x12345678 that wouldnt make sense. It is RISC not CISC not meant to have every instruction be able to do everything, think load store architecture more than CISC...

old_timer
  • 69,149
  • 8
  • 89
  • 168