How to get qemu to run an arm thumb binary?

Question

I'm trying to learn the basics of ARM assembly and wrote a fairly simple program to sort an array. I initially assembled it using the armv8-a option and ran the program under qemu while debugging with gdb. This worked fine and the program initialized the array and sorted it as expected.

Ultimately I would like to be able to write some assembly for my Raspberry Pi Pico, which has an ARM Cortex M0+, which I believe uses the armv6-m option. However, when I change the directive in my code, it compiles fine but behaves strangely in that the program counter increments by 4 after every instruction instead of the 2 that I expect for thumb. This is causing my program to not work correctly. I suspect that qemu is trying to run my code as if it were compiled for the full ARM instruction set instead of thumb, but I'm not sure why this is.

I am running on Ubuntu Linux 20.04 LTS, using qemu-arm version 4.2.1 (installed from the package manager). Does the qemu-arm executable only run full ARM binaries? If so, is there another qemu package I can install to run a thumb binary?

Here is my code if it is helpful:

.arch armv6-m
.cpu cortex-m0plus

.syntax unified
.thumb

.data
arr: .skip 4 * 10
len: .word 10

.section .text
.global _start

.align 2
_start:
    ldr r0, arr_adr @ load the address of the start of the array into register 0
    movs r1, #0 @ clear the counter register
    movs r2, #100

init_loop:
    str r2, [r0,r1] @ store r2's value to the base address of the array plus the offset stored in r1
    subs r2, r2, #10 @ subtract 10 from r2
    adds r1, #4 @ add 4 to the offset (1 word in bytes)
    cmp r1, #40 @ check if we've reached the end of the array
    bne init_loop

    movs r1, #0 @ clear the offset
out_loop:
    mov r3, r1 @ set the index of the minimum value to the current array index

    mov r4, r1 @ set the inner loop index to the outer loop index

in_loop:
    ldr r5, [r0,r3] @ load the minimum index's value to r5
    ldr r6, [r0,r4] @ load the inner loop's next value to r6
    cmp r6, r5 @ compare the two values
    bge in_loop_inc @ if r6 is greater than or equal to r5, increment and restart loop
    mov r3, r4 @ set the minimum index to the current index
in_loop_inc:
    adds r4, #4
    cmp r4, #40 @ check if at end of array
    blt in_loop

    ldr r5, [r0,r3] @ load the minimum index value into r5
    ldr r6, [r0,r1] @ load the current outer loop index value into r6
    str r6, [r0,r3] @ swap the two values
    str r5, [r0,r1]

    adds r1, #4 @ increment outer loop index
    cmp r1, #40 @ check if at end of array
    blt out_loop

loop:
    nop
    b loop

arr_adr: .word arr

Thank you for your help!

so the person who fixed the cortex-m sims on qemu at one point was on stackoverflow, maybe this person will show again. I know that one thing that needed to happen that is not required for hardware was that at least for gnu tools the linker script needed an entry point and that entry point needed to be a thumb function address for the emulation to start up right. — old_timer, Jan 28 '22 at 00:13
and/or what does your disassembly look like are you actually generating the right code? what if you build for cortex-m0 not plus (I didnt know that string was supported like that, it really doesnt matter for code generation, both are armv6-m) — old_timer, Jan 28 '22 at 00:14
and maybe try something even simpler like increment a register forever two instructions... — old_timer, Jan 28 '22 at 00:14

score 3 · Answer 1 · edited Feb 13 '22 at 14:59

3

memmap

MEMORY
{
    ram  : ORIGIN = 0x00000000, LENGTH = 32K
}

SECTIONS
{
   .text : { *(.text*) } > ram
}

strap.s

.cpu cortex-m0
.thumb
.syntax unified

.globl reset_entry
reset_entry:
    .word 0x20001000
    .word reset
    .word hang
    .word hang
    .word hang

.thumb_func
reset:
    ldr r0,=0x40002500
    ldr r1,=4
    str r1,[r0]
    ldr r0,=0x40002008
    ldr r1,=1
    str r1,[r0]

    ldr r0,=0x4000251C
    ldr r1,=0x30
    ldr r2,=0x37
loop_top:
    str r1,[r0]
    adds r1,r1,#1
    ands r1,r1,r2
    b loop_top

.thumb_func
hang:
    b hang

build

arm-linux-gnueabi-as --warn --fatal-warnings  strap.s -o strap.o
arm-linux-gnueabi-ld strap.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list

Check the vector table as a quick check:

Disassembly of section .text:

00000000 <reset_entry>:
   0:   20001000    andcs   r1, r0, r0
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   0000002f    andeq   r0, r0, pc, lsr #32
   c:   0000002f    andeq   r0, r0, pc, lsr #32
  10:   0000002f    andeq   r0, r0, pc, lsr #32

00000014 <reset>:
  14:   4806        ldr r0, [pc, #24]   ; (30 <hang+0x2>)
  16:   4907        ldr r1, [pc, #28]   ; (34 <hang+0x6>)
  18:   6001        str r1, [r0, #0]
  1a:   4807        ldr r0, [pc, #28]   ; (38

Looks good,

run it

qemu-system-arm -M microbit -nographic -kernel notmain.elf

and it will spew out 0123456701234567...until you ctrl-a then x to exit qemu.

Note this binary will not work on a real chip as I am cheating the uart.

You can get your feet wet with this sim. There is also a luminary micro one from the first cortex-m chips and you can limit yourself to armv6m instructions on that platform as well.

qemu and sims like this have very limited value for mcu work since almost all of the work is related to peripherals and pins, and the instruction set is just like the language of a book, French, Russian, English, German, doesn't matter a biology book is a biology book and the book is the goal. The peripherals are specific to the chip (the pico, a specific stm32 chip, a specific TI tiva C chip, etc).

edited Feb 13 '22 at 14:59

halfer

19,824
17
99
186

answered Feb 02 '22 at 16:12

old_timer

69,149
8
89
168

Sorry for the late reply and thank you very much for your answer. I was hoping to use the sim since I am unsure of how to use a debugger with the Pico, but I probably should sit down for a few hours when I have time and just figure it out. I think I'll probably switch to using the actual board. Thank you for your advice and suggestions! – Nathan S Feb 06 '22 at 15:31
the pi folks have created an interesting solution there that if you use two pico boards one can be the debugger for the other. I have no use for gdb or other such tools, so I dont know what that experience is like for that product. the pico is a very interesting product, has some nice features (And some not so nice of course). the documentation is not great, but there is a very short list of well documented products. there was enough there though I could do what I needed. – old_timer Feb 06 '22 at 15:36
if you are just working through bare metal basics though a sim is not so bad, for a cortex-m you could create your own instruction set simulator over a weekend and learn the platform better than most. remember though that the core processor is like the language of a book, can print a bioligy book in german , or spanish or russian, same content. you can have a very useful mcu using a cortex-m an msp430 an avr, mips, risc-v, etc as the core processor. there are existing products that had a cortex-m before and they swapped out for a risc-v, register set, peripherals all identical – old_timer Feb 06 '22 at 15:38
even broadcom did that with the raspberry pi products, carved out and replaced the arm processor a few times. – old_timer Feb 06 '22 at 15:38
learning cortex-m in general is not a bad choice professionally. risc-v is a bit more of a risk as it could go in a good direction or it could go bad, or just lose popularity, we will see. – old_timer Feb 06 '22 at 15:39
baremetal, particularly on an mcu, is more painful than just learning to program in general as when you make even a tiny mistake (even as simple as the order of objects on a command line), you can hang the chip or even brick the board beyond recovery or brick it so that you have to get out special debug tools or sometimes solder things to the board. after decades i still brick boards from time to time. mastering the toolchain and taking advantage of sims that are useful (some can be more pain than good) are a good way to at least not burn through hardware. – old_timer Feb 06 '22 at 15:42
Thank you for all of the insight, I will keep it in mind in the future! I guess I will be buying another Pico board to debug me current one. I have only been messing with MCUs for a little over a year and I have bricked several so far, so hopefully that is a trend that stops! – Nathan S Feb 06 '22 at 16:03

score 2 · Accepted Answer · answered Jan 28 '22 at 11:26

There are a couple of concepts to disentangle here:

(1) Arm vs Thumb : these are two different instruction sets. Most CPUs support both, some support only one. Both are available simultaneously if the CPU supports both. To simplify a little bit, if you jump to an address with the least significant bit set that means "go to Thumb mode", and jumping to an address with that bit clear means "go to Arm mode". (Interworking is a touch more complicated than that, but that's a good initial mental model.) Note that all Arm instructions are 4 bytes long, but Thumb instructions can be either 2 or 4 bytes long.

(2) A-profile vs M-profile : these are two different families of CPU architecture. M-profile is "microcontrollers"; A-profile is "applications processors", which is "(almost) everything else". M-profile CPUs always support Thumb and only Thumb code. A-profile CPUs support both Arm and Thumb. The Raspberry Pi Pico is a Cortex-M0+, which is M-profile.

(3) QEMU system emulation vs user-mode emulation : these are two different QEMU executables which run guest code in different ways. The system emulation binary (typically qemu-system-arm) runs "bare metal code", eg an entire OS. The guest code has full control and can handle exceptions, write to hardware devices, etc. The user emulation binary (typically qemu-arm) is for running Linux user-space binaries. Guest code is started in unprivileged mode and has access to the usual Linux system calls. For system emulation, which CPU is being emulated depends on what machine type you select with the -M or --machine option. For user-mode emulation, the default CPU is "A-profile with all supported features enabled" (this is --cpu max).

You're currently using qemu-arm which means you get user-mode emulation. This should support Thumb binaries, but unless you pass it a --cpu option it will be using an A-profile CPU. I would also suggest using a newer QEMU for M-profile work, because a lot of M-profile CPU bugs have been fixed since version 4.2. I think 4.2 is also too old to have the Cortex-M0 CPU.

GDB should tell you in the PSR what the T bit is set to -- use that to check whether you're in Thumb mode or Arm mode, rather than looking at how much the PC is incrementing by.

There's currently no QEMU system emulation of the Raspberry Pi Pico (though somebody has been doing some experimental work on one). If your assembly is just basic "working with registers and a bit of memory" you can do that with the user-mode emulator. Or you can try the 'microbit' machine model, which is a Cortex-M0 board -- if you're not doing things that are specific to the Pi Pico that might be good enough.

Sorry for the late reply, I have been busy with school the last week. Thank you for the answer, I was a bit confused why a single CPU could have two instruction sets (my only other assembly experience is with AVR chips which are pretty straightforward). I tried a newer qemu (built from source) and added a .thumb_func line to my code, and it seems to work now. — Nathan S, Feb 06 '22 at 15:27

How to get qemu to run an arm thumb binary?

2 Answers2