I'm trying to assemble this code using Keystone and execute it with the Unicorn engine:
start:
add r0, r0, #1
add r1, r1, #2
bl start
b start
In my opinion, the bl
instruction should save the address of the next instruction to the lr
register and then jump to start
. So it'll be an infinite loop that adds 1
to r0
and 2
to r1
.
Apparently, I'm wrong, because bl start
branches to itself instead!
I'm using Python wrappers for Keystone, Capstone and Unicorn to process the assembly. Here's my code:
import keystone as ks
import capstone as cs
import unicorn as uc
print(f'Keystone {ks.__version__}\nCapstone {cs.__version__}\nUnicorn {uc.__version__}\n')
code = '''
start:
add r0, r0, #1
add r1, r1, #2
bl start
b start
'''
assembler = ks.Ks(ks.KS_ARCH_ARM, ks.KS_MODE_THUMB)
disassembler = cs.Cs(cs.CS_ARCH_ARM, cs.CS_MODE_THUMB)
emulator = uc.Uc(uc.UC_ARCH_ARM, uc.UC_MODE_THUMB)
machine_code, _ = assembler.asm(code)
machine_code = bytes(machine_code)
print(machine_code.hex())
initial_address = 0
for addr, size, mnem, op_str in disassembler.disasm_lite(machine_code, initial_address):
instruction = machine_code[addr:addr + size]
print(f'{addr:04x}|\t{instruction.hex():<8}\t{mnem:<5}\t{op_str}')
emulator.mem_map(initial_address, 1024) # allocate 1024 bytes of memory
emulator.mem_write(initial_address, machine_code) # write the machine code
emulator.hook_add(uc.UC_HOOK_CODE, lambda uc, addr, size, _: print(f'Address: {addr}'))
emulator.emu_start(initial_address | 1, initial_address + len(machine_code), timeout=500)
This is what it outputs:
Keystone 0.9.1
Capstone 5.0.0
Unicorn 1.0.2
00f1010001f10201fff7fefff8e7
0000| 00f10100 add.w r0, r0, #1
0004| 01f10201 add.w r1, r1, #2
0008| fff7feff bl #8 ; why not `bl #0`?
000c| f8e7 b #0
Address: 0
Address: 4
Address: 8 # OK, we arrived at BL start
Address: 8 # we're at the same instruction again?
Address: 8 # and again?
Address: 8
< ... >
Address: 8
Address: 8
Traceback (most recent call last):
File "run_ARM_bug.py", line 32, in <module>
emulator.emu_start(initial_address | 1, initial_address + len(machine_code), timeout=500)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/unicorn-1.0.2rc3-py3.7.egg/unicorn/unicorn.py", line 317, in emu_start
unicorn.unicorn.UcError: Emulation timed out (UC_ERR_TIMEOUT)
The exception is not a problem (I set the timeout myself). The problem is that bl start
always jumps to itself instead of start
.
If I jump forward, however, everything will work as expected, so this works - bl
jumps to the correct address:
start:
; stuff
bl next
; hello
next:
add r0, r0, #1
bkpt
EDIT
I went on and assembled this code with Clang:
; test.s
.text
.syntax unified
.globl start
.p2align 1
.code 16
.thumb_func
start:
add r0, r0, #1
add r1, r1, #2
bl start
b start
Used the following commands:
$ clang -c test.s -target armv7-unknown-linux -o test.bin -mthumb
clang-11: warning: unknown platform, assuming -mfloat-abi=soft
And then disassembled test.bin
with objdump
:
$ objdump -d test.bin
test.bin: file format elf32-littlearm
Disassembly of section .text:
00000000 <start>:
0: 00 f1 01 00 add.w r0, r0, #1
4: 01 f1 02 01 add.w r1, r1, #2
8: ff f7 fe ff bl #-4
c: ff f7 fe bf b.w #-4 <start+0x10>
$
So bl
's argument is actually an offset. It's negative because we're going backwards. BUT, as the documentation says:
For
B
,BL
,CBNZ
, andCBZ
instructions, the value of the PC is the address of the current instruction plus 4 bytes.
So bl #-4
will jump to (the address of bl) + 4 bytes - 4 bytes
, or, in other words, itself, again!
So, I can't bl
backwards for some reason? What's happening here and how to fix it?