How is recursion possible in AVR Assembly?

Question

I can't seem to wrap my head around recursion in Assembly Language. I understand how it works in higher level languages, but I don't understand how it is possible in assembly when the return value cannot be passed directly to the function.

I'm trying to make a recursive factorial function in AVR, but I don't understand how the stack passes the value when factorial requires n * (n-1), requiring both n and n-1 simultaneously

Save `n` on the stack, do the `factorial(n-1)`, get `n` back from the stack, multiply, profit. — Jester, Nov 18 '17 at 02:11
high level language code is compiled to assembly then linked. perhaps just examine what the compiler produces. — old_timer, Nov 18 '17 at 03:06
[AVR gcc 4.6.4 with `-O1` compiles `fac(int n)` into a recursive function](https://godbolt.org/g/x5BRsy). `-O2` or higher optimizes the recursion into a loop. — Peter Cordes, Nov 18 '17 at 03:52

Wagner Lipnharski · Answer 1 · 2021-09-22T22:30:16.363

I just helped another person with the small code below to calculate factorial in AVR AtMega assembly. It produces a factorial from 1~10, resulting in decimal 3628800 (hex 0x375F00). It uses exactly what the OP wanted, if selected 8! as number! in R2, it will move 8 to the resulting bytes, then multiply by number!-1 and so on, until it reaches 1, then it ends. The multiplication 24x8 is the trickiest I could write, saving registers and clock cycles. It doesn't use stack nor RAM, straight use of AVR registers.

; Input at R2, value 1~10, from 1! to 10!
; Result 1~3628800 (0x375F00) at:  R20:R21:R22 (LSB)
; Temporary Multiplication Middle Byte: R17

      ldi  r16, low(RAMEND)
      out  SPL, r16
      ldi  r16, high(RAMEND)
      out  SPH, r16

      Mov  R16, R2      ; Get Value to factor
      Rcall A0          ; Call Factorial
      ...
      
      
A0:   Clr  R20          ; Results = Number!
      Clr  R21          ;
      Ldi  R22, R16     ;

A1:   Dec  R16          ; Number! - 1
      Cpi  R16,1        ; If 1 then ended
      Brne A2           ;
      Ret
                        ; This multiplication 24x8 is tricky, fast and save bytes 
A2:   Mul  R22, R16     ; Mul Result LSB x Number!-1
      Mov  R22, R0      ; LSB Mul to Result LSB Byte 
      Mov  R17, R1      ; MSB Mul to Temporary Middle Byte

      Mul  R20, R16     ; Mul Result MSB x Number!-1
      Mov  R20, R0      ; LSB Mul to MSB Result Byte, ignore MSB Mul, will be zero
      
      Mul  R21, R16     ; Mul Result Middle x Number!-1
      Mov  R21, R0      ; LSB Mul to Result Middle Byte
      Add  R21, R17     ; Add Temporary Middle to Result Middle Byte
      Adc  R20, R1      ; Add MSB Mul with Carry to Result MSB Byte
      
      Rjmp A1

Yes, in asm it's more efficient to optimize simple tail recursion into a loop, and factorial is kind of too trivial to be a good test of correct implementation of recursion. (Because you can still get the right answer even if it's just using push/pop of return addresses as a loop counter, but the loop body is using registers across iterations.) So this is an answer to the factorial part, but not the recursion part where factorial is just a placeholder problem. (Related: [What are good examples that actually motivate the study of recursion?](https://cseducators.stackexchange.com/a/4361)) — Peter Cordes, Sep 22 '21 at 23:57
The use of recursion is to not worry about "where am I" in the code, since a multiple calls would end up using multiple returns and put you back on track. Typical on nested interruptions. Recursion end up using lots of memory (stack) and overhead instructions, it means high cost of cpu clock cycles. Memory access (stack) is time consuming in a processor/controller, the AVR it takes 2 and even 3 clock cycles, when 85% of other instructions (register access and logic) uses only 1 clock cycle. Write in Assembly because you want speed? forget recursion. Some people doesn't know better. — Wagner Lipnharski, Sep 23 '21 at 17:41
Yeah, my sentiments exactly on these silly exercises that want you to use recursion for something that's better done without. That's what I said in my linked answer about assembly language on cseducators.SE. The fact remains that this question is titled "how is recursion possible in AVR assembly?", not "how to compute factorial efficiently?". It's certainly worth mentioning how dumb recursion is for this problem if you were going to answer the question asked, though. — Peter Cordes, Sep 23 '21 at 17:49

score 0 · Accepted Answer · answered Nov 18 '17 at 03:53

Using addition instead of multiplication

unsigned int accumulate(unsigned int n)
{
    if(n) return(n+accumulate(n-1));
    return(1);
}

and a different instruction set, perhaps easier to follow

00000000 <accumulate>:
   0:   e3500000    cmp r0, #0
   4:   0a000005    beq 20 <accumulate+0x20>
   8:   e3a03000    mov r3, #0
   c:   e0833000    add r3, r3, r0
  10:   e2500001    subs    r0, r0, #1
  14:   1afffffc    bne c <accumulate+0xc>
  18:   e2830001    add r0, r3, #1
  1c:   e12fff1e    bx  lr
  20:   e3a00001    mov r0, #1
  24:   e12fff1e    bx  lr

In this case the compiler didnt actually call the function, it detected what was going on and just made a loop.

Since there is nothing magic about recursion there is no difference in whether you call the same function or some other function.

unsigned int otherfun ( unsigned int );
unsigned int accumulate(unsigned int n)
{
    if(n) return(n+otherfun(n-1));
    return(1);
}

00000000 <accumulate>:
   0:   e92d4010    push    {r4, lr}
   4:   e2504000    subs    r4, r0, #0
   8:   03a00001    moveq   r0, #1
   c:   0a000002    beq 1c <accumulate+0x1c>
  10:   e2440001    sub r0, r4, #1
  14:   ebfffffe    bl  0 <otherfun>
  18:   e0800004    add r0, r0, r4
  1c:   e8bd4010    pop {r4, lr}
  20:   e12fff1e    bx  lr

so this shows how it works. Instead of using the stack to store the sum, the cheaper solution if you have the registers is to use a non-volatile register save that register to the stack then use that register during the funciton, depends on how many registers you have and how many local intermediate values you need to track. So r4 gets a copy of n coming in, then that is added (for factorial it is a multiply which depending on the instruction set and code can produce a lot more code that can confuse the understanding so I used addition instead) to the return value from the call to the next function (with recursion where the compiler didnt figure out what we were doing this would have been a call to ourselves, and we can write this asm and make it a call to ourselves to see how it works)

Then the function returns the sum.

If we assume that otherfun is really accumulate we enter this function with a 4 lets say

00000000 <accumulate>:
   0:   e92d4010    push    {r4, lr}
   4:   e2504000    subs    r4, r0, #0
   8:   03a00001    moveq   r0, #1
   c:   0a000002    beq 1c <accumulate+0x1c>
  10:   e2440001    sub r0, r4, #1
  14:   ebxxxxxx    bl  accumulate
  18:   e0800004    add r0, r0, r4
  1c:   e8bd4010    pop {r4, lr}
  20:   e12fff1e    bx  lr

r4 and lr are saved on the stack (call this r4-4 and lr-4)
r4 = n (4)
r0 = n-1 (3)
call accumulate with n-1 (3)
r4 (4) and lr are saved on the stack (r4-3, lr-3) lr now points back into
r4 = n (3)
r0 = n-1 (2)
call accumulate with n-1 (2)
r4 (3) and lr are saved on the stack (r4-2, lr-2)
r4 = n (2)
r0 = n-1 (1)
call accumulate with n-1 (1)
r4 (2) and lr are saved on the stack (r4-1, lr-1)
r0 = n-1 (0)
call accumulate with n-1 (0)
now things change...
r0 = 1
return to lr-1 which is into accumulate after the call to accumulate
r4 gets 2 from the stack
r0 (1) = r0 (1) + r4 (2) = 3
return to lr-2 which is into accumulate r4 gets 3 from the stack
r0 (3) = r0 (3) + r4 (3) = 6
return to lr-3 which is into accumulate r4 gets 4 from the stack
r0 (6) = r0 (6) + r4 (4) = 10
return to lr-4 which is the function that called accumulate r4 is restored
to what it was before accumulate was first called, r4 is non-volatile you have to for this instruction set return r4 the way you found it (as well
as others, but we didnt modify those)

so the addition in this case multiplication in your desired case is

result = 1 + 2 + 3 + 4

How that happened is we basically pushed n on the stack then called the function with n-1. In this case we push 4, 3, 2, 1 then we start to unwind that and each return processes 1 then 2 then 3 then 4 as it returns taking those from the stack essentially.

the bottom line is you dont have to care about recursion to support recursion simply use an abi that supports recursion, which is not hard to do, then hand code the instructions in assembly as if you were the compiler

Maybe this makes it easier to see. n coming in is both a parameter coming in but also for the duration of the function it is a local variable, local variables go on the stack.

unsigned int accumulate(unsigned int n)
{
    unsigned int m;
    m = n;
    if(n) return(m+accumulate(n-1));
    return(1);
}

back to this

unsigned int accumulate(unsigned int n)
{
    if(n) return(n+accumulate(n-1));
    return(1);
}

so independent of the instruction set

accumulate:
  if(n!=0) jump over
  return_reg = 1
  return
  over:
  push n on the stack
  first parameter (stack or register) = n - 1
  call accumulate
  pop or load n from the stack 
  return_reg = return_reg + n
  clean stack
  return

And also deal with return addresses for the instruction set if required. The ABI may use the stack to pass parameters or registers.

If I didnt follow the arm abi I could implement

accumulate:
   cmp r0,#0
   bne over
   mov r0,#1
   bx lr
over:
   push {lr}
   push {r0}
   sub r0,#1
   bl accumulate
   pop {r1}
   add r0,r0,r1
   pop {lr}
   bx lr

for grins an instruction set that uses the stack for most things not registers

00000000 <_accumulate>:
   0:   1166            mov r5, -(sp)
   2:   1185            mov sp, r5
   4:   10a6            mov r2, -(sp)
   6:   1d42 0004       mov 4(r5), r2
   a:   0206            bne 18 <_accumulate+0x18>
   c:   15c0 0001       mov $1, r0
  10:   1d42 fffc       mov -4(r5), r2
  14:   1585            mov (sp)+, r5
  16:   0087            rts pc
  18:   1080            mov r2, r0
  1a:   0ac0            dec r0
  1c:   1026            mov r0, -(sp)
  1e:   09f7 ffde       jsr pc, 0 <_accumulate>
  22:   6080            add r2, r0
  24:   65c6 0002       add $2, sp
  28:   1d42 fffc       mov -4(r5), r2
  2c:   1585            mov (sp)+, r5
  2e:   0087            rts pc

it does a stack frame thing
gets the n parameter from the stack
saves that n parameter to the stack
compares and branches if not zero
in the if zero case we set the return value to 1
clean up the stack and return
now in the if not zero case
make the first parameter n-1
call a function (ourself)
do the addition and return

the short answer it is trivial to have n and n-1 in registers or memory at the same time, but when using recursion you arent actually doing it that way you have an accumulator and the next value to multiply into the accumulator, just like you would when using a loop. x = 1; while(n) { x*=n--; } or something like that. — old_timer, Nov 18 '17 at 03:57
The instruction set doesnt matter so long as it has the basic elements required to tolerate compiled languages like C for example. AVR falls into that category so it is possible. — old_timer, Nov 18 '17 at 03:58

How is recursion possible in AVR Assembly?

2 Answers2