0

I have a requirement that I have to multiply a 64 bit number which is stored in 2 seperate registers with 64 mod 2^64. In this case the two 32bit values are stored in R4(low) and R5(high) So far I have written this code:

LDR         R1,=Value
LDR         R2, [R1]
MOV         R3, #64
UMULL       R4, R5, R2, R3
MUL         R6, R4, R3
MUL         R7, R5, R3
MOV         R1, R6, LSR #32
MOV         R2, R7, LSR #32

Value       DCD         67108864

For the mod 2^64 part I do the last two MOV commands with 32 bit shifts. Is this correct? When I do this I always get the value 0 in R1 and R2. What can I do so that I dont get mod 2^64 = 0?

Florent
  • 111
  • 10
  • 2
    If it is a 64-bit integer number then isn't it always (and already) mod 2^64 if you ignore overflow? That is, just shift it and you're done? – Dave S Feb 17 '22 at 19:07
  • @DaveS Yea thats what I dont get sometimes. Some of the commands in our script have mod 2^32 or mod 2^64 behin the operation. For example ADDS R1, R2, R3 => R1 = R2 + R3 mod 2^32. So the mod 2^32 is there to check if you reached the "border" of 32 bit register? – Florent Feb 17 '22 at 19:11
  • Yes in some sense. This `mod` is a mathematical way to remind you that there is no capture of result beyond the low order 32 bits; this is easy and natural for the hardware: there is no extra hardware or steps to do this mod operation. Further, that describes that there is no overflow check automatically done; that the result is simply truncated for fitting back into a 32-bit register. Should probably be `R1 = (R2 + R3) mod 2^32`. – Erik Eidt Feb 17 '22 at 20:48
  • @ErikEidt So my two MOV commands are obsolete? Only the 2 MUL commands are enough? – Florent Feb 17 '22 at 20:57
  • There's no need for any `mul` instructions, just shifting since you're multiplying by a power of 2. (You do need to shift bits across the 32-bit boundary between low and high halves, though, possibly with shift/shift/OR, although ARM can fold one of those shifts into the OR.) – Peter Cordes Feb 17 '22 at 21:24
  • 1
    Yup, compiler output https://godbolt.org/z/4sz6fzanf uses lsl / orr (with a shifted source) for the high half, then lsl for the low half. I of course wrote a function instead of loading/storing a global. Or with just a 32-bit source, only two shifts. – Peter Cordes Feb 17 '22 at 22:41
  • Ok I see what youre saying, basically this after the UMULL: `MOV R6,R4, LSL #6` and `MOV R7, R5, LSL #6` Both high and low shifted by 6 Bits 2^6 = 64. And there are no other factors that I have to watch out for when doing this. What if I multiply the value 2^30 * 64? Wouldnt I get issues? – Florent Feb 18 '22 at 14:17
  • @Florent: Yes, just the shifts alone would give the wrong high half; that would shift zeros into the high half where it should have the high bits of the low half shifted in. So that's why you need the `ORR` in between, which you can see if you look closely at the compiler output Peter linked. – Nate Eldredge Feb 19 '22 at 17:19
  • @NateEldredge Yes thanks, I forgot to write something back here. I got it now. Thanks for you all to help me understand this. – Florent Feb 20 '22 at 10:42

0 Answers0