2

I've been helping with labs on a course in ARM7 assembly language and today encountered a problem where a student had entered the following expression:

MUL R0, R0, R1

The code didn't compile. The solution is to change the expression to:

MUL R0, R1, R0

i.e. the first two arguments of MUL cannot be the same register. I already knew this as it is part of the documentation for ARM: http://infocenter.arm.com/help/topic/com.arm.doc.dui0489i/DUI0489I_arm_assembler_reference.pdf

The student was happy enough that their problem was fixed, but I'm rather frustrated that I don't know why ARM7 requires that the arguments be passed like this. I thought that it might have something to do with one of the registers being used to store intermediate values while the multiplier was shifting and adding, but I'm not even sure if that's how multiplication works on ARM (in fact, I'm fairly sure it's not). Why is the order of the arguments so important here?

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
Gary
  • 63
  • 7
  • a bug in the ip most likely. esp in the older arm7 days when you had a foundry layout not verilog. Also with the bug they can try violating the rule, and see if you are using real arm ip or a clone or have stolen the arm ip...a number of the "unpredictable results" items in the arm arm fall into this category (IMO). there are cases where the actual bug is only on some cores and not others. if you have access to the old paper versions of the arm arm up into the electronic you can see these things come and go. – old_timer Oct 09 '15 at 21:13

2 Answers2

4

The fact that "Rn must be different from Rd in architectures before ARMv6" suggests it's a design limitation of how multiplies were implemented in the original three-stage ARM pipeline. Before ARMv6 means CPUs with the ARM7 or earlier designs and these all used a simple three-stage pipeline. Unlike most instructions multiplication takes multiple cycle to execute, and based on the instruction set limitation it appears that your suspicion is correct the destination register Rd is modified each cycle to calculate the result.

The paper Verifying ARM6 Multiplication by Anthony Fox, supports this by showing in Figure 4 (reformatted below to fit the limitations of Stack Exchange's markup) how Rd is modified during the execution of multiplication instructions by the ARM6 core:

  • t3

    • Fetch an instruction
    • Increment the program counter
    • Set mul1 to reg[Rs]
    • Set borrow to false
    • Set count1 to zero


    • Set reg[Rd] to reg[Rn] if accumulate, otherwise zero

    • Set mul to mul1[1:0]
    • Set mul2 to mul1[31:2]
    • Set borrow2 to borrow
    • Set mshift to MSHIFT2(borrow,mul,count1)
  • tn

    • Set alub to reg[Rm] shifted left by mshift
    • Set alua to reg[Rd]
    • Set mul1 to mul2[29:0]
    • Set borrow to mul[1]
    • Set count1 to mshift[4:1] + 1


    • Set reg[Rd] to ALU6*(borrow2,mul,alua,alub)

    • Set mul to mul1[1:0]
    • Set mul2 to mul1[31:2]
    • Set borrow2 to borrow
    • Set mshift to MSHIFT2(borrow,mul,count1)
    • Update NZC flags of CPSR (if S flag set)
    • If the last iteration then decode the next instruction

Fig. 4: ARM6 implementation of the multiply instructions. Each cycle is split into two phases. The tn cycle is repeated until MULX(mul2,borrow,mshift) is true. Register Rd is not updated when Rd is equal to Rm or fifteen.

Since reg[Rd] is modified during both during the initial set up cycle t3 and the repeated tn cycles the result will be garbage if Rd == Rm since the step "Set alua to reg[Rm] shifted left by mshift" expects to read the original unmodified value of Rm, not the current intermediate value stored in Rd.

Certain ARM7 CPUs had a "fast multiplier" that processed 8 bits per cycle rather than 2 bits per cycle as described above, but it appears to modify a register during the calculation as well.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
1

My guess is this is a bug in the ip for one or more cores.

It is significantly easier esp in the arm7 days where you were given a layout from arm not source code to the core, to have a compiler work around an ip bug, than to fix the bug, recall all the units, scrap the ones in process, if the bug was found after a vendor has invested in the masks or is already in production.

with time arm (and others) have more things you can read and determine which specific core you have, and follow errata (Although software like Linux does a horrible job at this, applying wrong errata to wrong cores) to know what bugs to avoid.

Some number of the "unpredictable results" were in fact predictable just broken, and could be used by arm to determine if this is a clone or stolen ip.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • Wow, that's really interesting. Thank you so much. So it was possibly a mistake in the hardware that was cheaper to work around rather than fix or else it is a deliberate error introduced as a spike to detect clone ip cores? Cool! :D – Gary Oct 10 '15 at 12:35
  • that is my guess...That it was a bug, or a limitation to the nature of the implementation...I dont think this would have been intentional protection of the ip, but any errata for any company is a form of ip protection as they know the true nature of the problem and its effects rather than just telling the public its broke dont use it. – old_timer Oct 10 '15 at 12:47
  • intentional ip protection would be for example in the undefined instructions or registers, I would assume... – old_timer Oct 10 '15 at 12:49