Why can't the first two arguments to the MUL expression on ARM7 be the same?

Question

I've been helping with labs on a course in ARM7 assembly language and today encountered a problem where a student had entered the following expression:

MUL R0, R0, R1

The code didn't compile. The solution is to change the expression to:

MUL R0, R1, R0

i.e. the first two arguments of MUL cannot be the same register. I already knew this as it is part of the documentation for ARM: http://infocenter.arm.com/help/topic/com.arm.doc.dui0489i/DUI0489I_arm_assembler_reference.pdf

The student was happy enough that their problem was fixed, but I'm rather frustrated that I don't know why ARM7 requires that the arguments be passed like this. I thought that it might have something to do with one of the registers being used to store intermediate values while the multiplier was shifting and adding, but I'm not even sure if that's how multiplication works on ARM (in fact, I'm fairly sure it's not). Why is the order of the arguments so important here?

a bug in the ip most likely. esp in the older arm7 days when you had a foundry layout not verilog. Also with the bug they can try violating the rule, and see if you are using real arm ip or a clone or have stolen the arm ip...a number of the "unpredictable results" items in the arm arm fall into this category (IMO). there are cases where the actual bug is only on some cores and not others. if you have access to the old paper versions of the arm arm up into the electronic you can see these things come and go. — old_timer, Oct 09 '15 at 21:13

score 4 · Answer 1 · answered Aug 20 '17 at 15:44

The fact that "Rn must be different from Rd in architectures before ARMv6" suggests it's a design limitation of how multiplies were implemented in the original three-stage ARM pipeline. Before ARMv6 means CPUs with the ARM7 or earlier designs and these all used a simple three-stage pipeline. Unlike most instructions multiplication takes multiple cycle to execute, and based on the instruction set limitation it appears that your suspicion is correct the destination register Rd is modified each cycle to calculate the result.

The paper Verifying ARM6 Multiplication by Anthony Fox, supports this by showing in Figure 4 (reformatted below to fit the limitations of Stack Exchange's markup) how Rd is modified during the execution of multiplication instructions by the ARM6 core:

t₃

Fetch an instruction

Increment the program counter

Set mul1 to reg[Rs]

Set borrow to false

Set count1 to zero

Set reg[Rd] to reg[Rn] if accumulate, otherwise zero

Set mul to mul1[1:0]

Set mul2 to mul1[31:2]

Set borrow2 to borrow

Set mshift to MSHIFT2(borrow,mul,count1)

t_n

Set alub to reg[Rm] shifted left by mshift

Set alua to reg[Rd]

Set mul1 to mul2[29:0]

Set borrow to mul[1]

Set count1 to mshift[4:1] + 1

Set reg[Rd] to ALU6^*(borrow2,mul,alua,alub)

Set mul to mul1[1:0]

Set mul2 to mul1[31:2]

Set borrow2 to borrow

Set mshift to MSHIFT2(borrow,mul,count1)

Update NZC flags of CPSR (if S flag set)

If the last iteration then decode the next instruction

Fig. 4: ARM6 implementation of the multiply instructions. Each cycle is split into two phases. The t_n cycle is repeated until MULX(mul2,borrow,mshift) is true. Register Rd is not updated when Rd is equal to Rm or fifteen.

Since reg[Rd] is modified during both during the initial set up cycle t₃ and the repeated t_n cycles the result will be garbage if Rd == Rm since the step "Set alua to reg[Rm] shifted left by mshift" expects to read the original unmodified value of Rm, not the current intermediate value stored in Rd.

Certain ARM7 CPUs had a "fast multiplier" that processed 8 bits per cycle rather than 2 bits per cycle as described above, but it appears to modify a register during the calculation as well.

score 1 · Answer 2 · answered Oct 09 '15 at 21:18

1

My guess is this is a bug in the ip for one or more cores.

It is significantly easier esp in the arm7 days where you were given a layout from arm not source code to the core, to have a compiler work around an ip bug, than to fix the bug, recall all the units, scrap the ones in process, if the bug was found after a vendor has invested in the masks or is already in production.

with time arm (and others) have more things you can read and determine which specific core you have, and follow errata (Although software like Linux does a horrible job at this, applying wrong errata to wrong cores) to know what bugs to avoid.

Some number of the "unpredictable results" were in fact predictable just broken, and could be used by arm to determine if this is a clone or stolen ip.

answered Oct 09 '15 at 21:18

old_timer

69,149
8
89
168

Wow, that's really interesting. Thank you so much. So it was possibly a mistake in the hardware that was cheaper to work around rather than fix or else it is a deliberate error introduced as a spike to detect clone ip cores? Cool! :D – Gary Oct 10 '15 at 12:35
that is my guess...That it was a bug, or a limitation to the nature of the implementation...I dont think this would have been intentional protection of the ip, but any errata for any company is a form of ip protection as they know the true nature of the problem and its effects rather than just telling the public its broke dont use it. – old_timer Oct 10 '15 at 12:47
intentional ip protection would be for example in the undefined instructions or registers, I would assume... – old_timer Oct 10 '15 at 12:49

Why can't the first two arguments to the MUL expression on ARM7 be the same?

2 Answers2