Yes, you can do this, even reasonably, without self-modifying code.
You turn your return addresses into a giant case statement.
The secret is understanding that a "return address" is just a way
to get back to point of the call, and that memory is just a giant
array of named locations.
Imagine I have a program with many logical call locations, with the instruction
after the call labelled:
CALL S
$1: ...
...
CALL T
$2: ...
...
CALL U
$3: ...
We need to replace the CALLs with something our machine can implement.
Let's also assume temporarily that only one subroutine call is active at any moment.
Then all that matters, is that after a subroutine completes, that control
returns to the point after the call.
You can cause this by writing the following SIMPLETRON code (I'm making up the syntax). By convention I assume I have a bunch of memory locations K1, K2, ... that contain the constants 1, 2, .. etc for as many constants as a I need.
K1: 1
K2: 2
K3: 3
...
LOAD K1
JMP S
$1: ...
...
LOAD K2
JMP T
$2: ...
...
LOAD K3
JMP U
$3:....
S: STORE RETURNID
...
JMP RETURN
T: STORE RETURNID
...
JMP RETURN
U: STORE RETURNID
...
JMP RETURN
RETURN: LOAD RETURNID
SUB K1
JE $1
LOAD RETURNID
SUB K2
JE $2
LOAD RETURNID
SUB K3
JE $3
JMP * ; bad return address, just hang
In essence, each call site records a constant (RETURNID) unique to that call site, and "RETURN" logic uses that unique ID to figure out the return point. If you have a lot of subroutines, the return logic code might be quite long, but hey, this is a toy machine and we aren't that interested in efficiency.
You could always make the return logic into a binary decision tree; then
the code might be long but it would only take log2(callcount) to decide how to get back, not actually all that bad).
Let's relax our assumption of only one subroutine active at any moment.
You can define for each subroutine a RETURNID, but still use the same RETURN code. With this idea, any subroutine can call any other subroutine. Obviously these routines are not-reentrant, so they can't be called more than once in any call chain.
We can use this same idea to implement a return stack. The trick is to recognize that a stack is merely a set of memory locations with an address decoder that picks out members of the stack. So, lets implement
PUSH and POP instructions as subroutines. We change our calling convention
to make the caller record the RETURNID, leaving the accumulator free
to pass a value:
LOAD K1
STORE PUSHRETURNID
LOAD valuetopush
JMP PUSH
$1:
LOAD K2
STORE POPRETURNID
JMP POP
$2:...
TEMP:
STACKINDEX: 0 ; incremented to 1 on first use
STACK1: 0 ; 1st stack location
...
STACKN: 0
PUSH: STORE TEMP ; save value to push
LOAD PUSHRETURNID ; do this here once instead of in every exit
STORE RETURNID
LOAD STACKINDEX ; add 1 to SP here, once, instead of in every exit
ADD K1
STORE STACKINDEX
SUB K1
JE STORETEMPSTACK1
LOAD STACKINDEX
SUB K2
JE STORETEMPSTACK2
...
LOAD STACKINDEX
SUB Kn
JE STORETEMPSTACKn
JMP * ; stack overflow
STORETEMPSTACK1:
LOAD TEMP
STORE STACK1
JMP RETURN
STORETEMPSTACK2:
LOAD TEMP
STORE STACK2
JMP RETURN
...
POP: LOAD STACKINDEX
SUB K1 ; decrement SP here once, rather than in every exit
STORE STACKINDEX
LOAD STACKINDEX
SUB K0
JE LOADSTACK1
LOAD STACKINDEX
SUB K1
JE LOADSTACK2
...
LOADSTACKn:
LOAD STACKn
JMP POPRETURN
LOADSTACK1:
LOAD STACK1
JMP RETURNFROMPOP
LOADSTACK2:
LOAD STACK2
JMP RETURNFROMPOP
RETURNFROMPOP: STORE TEMP
LOAD POPRETURNID
SUB K1
JE RETURNFROMPOP1
LOAD POPRETURNID
SUB K2
JE RETURNFROMPOP2
...
RETURNFROMPOP1: LOAD TEMP
JMP $1
RETURNFROMPOP2: LOAD TEMP
JMP $2
Note that we need RETURN, to handle returns with no value, and RETURNFROMPOP, that handles returns from the POP subroutine with a value.
So these look pretty clumsy, but we can now realize a pushdown stack
of fixed but arbitrarily large depth. If we again make binary decision trees out the stack location and returnID checking, the runtime costs are only logarithmic in the size of the stacks/call count, which is actually pretty good.
OK, now we have general PUSH and POP subroutines. Now we can make calls that store the return address on the stack:
LOAD K1 ; indicate return point
STORE PUSHRETURNID
LOAD K2 ; call stack return point
JMP PUSH
$1: LOAD argument ; a value to pass to the subroutine
JMP RECURSIVESUBROUTINEX
; returns here with subroutine result in accumulator
$2:
RECURSIVESUBROUTINEX:
...compute on accumulator...
LOAD K3 ; indicate return point
STORE PUSHRETURNID
LOAD K4 ; call stack return point
JMP PUSH
$3: LOAD ... ; some revised argument
JMP RECURSIVESUBROUTINEX
$4: ; return here with accumulator containing result
STORE RECURSIVESUBROUTINERESULT
LOAD K5
STORE POPRETURNID
JMP POP
$5: ; accumulator contains return ID
STORE POPRETURNID
LOAD RECURSIVESUBROUTINERESULT
JMP RETURNFROMPOP
That's it. Now you have fully recursive subroutine calls with a stack, with no (well, faked) indirection.
I wouldn't want to program this machine manually because building the RETURN routines would be a royal headache to code and keep right. But a compiler would be perfectly happy to manufacture all this stuff.