Dynamic assembly in THUMB or ARM mode

Question

In GNU AS I understand it's possible to use Unified Syntax and sometimes get ARM code to automagically compile as Thumb code. In some cases this can produce impressive gains in code density and in many cases it just doesn't work because there's some ARM instruction that's impossible to do in Thumb mode.

What I'd like is for some way for GNU AS to "fall back" to ARM when attempting to compile a function block in Thumb mode. So, if there's an instruction that doesn't work in Thumb, it's compiled in ARM mode, but if works, then I get the code shrinkage.

Without having to annotate each of some 50,000 stub functions.

I've tried Googling and haven't found ANYTHING, so any help would be appreciated.

EDIT:

Thanks to the input I was able to get a makefile that tries to build Thumb first and falls back successfully to ARM. It's slow right now, but it works. Very pleased, thanks for all the input.

It seems unlikely that this would help much, even if possible. An instruction as simple as `add r0, r1` would force you back into ARM mode, because Thumb would have to substitute `adds r0, r1` and the assembler can't know whether that is safe. — Nate Eldredge, Sep 29 '21 at 13:36
no need to google, just look at the assembler options. I cant imagine why this would ever be a feature they would add. — old_timer, Sep 29 '21 at 13:40
one would have to ask why you have 50,000 stub functions before running into this. — old_timer, Sep 29 '21 at 13:41
@NateEldredge: Thumb can actually encode `4408 add r0, r1`, vs. `1840 adds r0, r0, r1`. This may be Thumb2-only, but it seems to work with `arm-none-eabi-gcc -c -mcpu=cortex-m0 foo.s`. There are probably other examples, though. An assembler could maybe have an option to warn / tell you about the showstopper instructions, especially in cases where the flag-setting version would allow Thumb. — Peter Cordes, Sep 29 '21 at 13:44
@old_timer it's my own code emitter for a 68000 emulator; every single opcode is broken out into its own stub routine; my unofficial count is 52,919 valid opcode permutations. — Renee Cousins, Sep 29 '21 at 16:53
@ReneeCousins Have you tried changing your code emitter so it only generates valid thumb instructions? Seems like a no brainer tbh. — fuz, Sep 29 '21 at 17:52
@fuz it's an option; though Thumb is a little more starved for registers than ARM. — Renee Cousins, Sep 30 '21 at 04:21
@ReneeCousins With Thumb2 you can use all 16 registers, but most instructions using the high registers R8–R15 have a 4 byte encoding. — fuz, Sep 30 '21 at 08:29
@PeterCordes The `4408` instruction is encodable in Thumb1, but it was for a long time marked as “UNPREDICTABLE.” Only with Thumb2 was it explicitly permitted to have two low registers as operands, but I am unaware of any Thumb1 processors that didn't support this. — fuz, Sep 30 '21 at 08:31

Peter Cordes · Accepted Answer · 2021-09-29T13:53:04.773

It would be plausible for an assembler to have this feature, but GAS doesn't. GAS is designed as a one-pass assembler so it doesn't like to back-track. (For x86 it does do branch-displacement optimization, so it can make multiple passes over its internal data structure representing the code before emitting it. That might or might not be sufficient to add such a feature.)

There are some Thumb-only instructions like tbb. If a function can't be assembled in either Thumb or ARM mode, the assembler would have to tell you about it and error. But that would still be a useful and desirable behaviour for some use-cases. And for old code that was originally written for ARM-only, you wouldn't run into this problem.

Part of the problem would be knowing where functions end. The MASM-style proc / endp model would make that possible, but ARM assembly (GAS or Keil/ARMASM) doesn't do that. Instead just labels at the top of functions.

You could introduce a new directive like .auto_func (vs .thumb_func and .arm_func), and treat any of those three as boundaries between functions for this hypothetical feature.

You'd also want something to warn you when an innocent-seeming instruction caused a whole function to fall back to ARM, like an add r0, #123. (Instead of adds.)

That add is encodeable with a 32-bit Thumb-2 encoding, for CPUs that support Thumb-2. e.g. not Cortex-M0. M0 doesn't support ARM mode at all, but it's an easy CPU to remember as (mostly) not supporting Thumb 2 for testing how things assemble.

I originally misunderstood the question. The part of my answer below is pointing out that it's not viable (or a good idea) to have a mix inside a single function.

Switching between ARM and Thumb on a per-instruction basis inside functions would be impossible, or at best grossly inefficient if you did use instructions that aren't encodeable as 16-bit (or 32-bit Thumb-2 which significantly expands what you can do in Thumb mode).

Switching the CPU between decoding in Thumb and ARM modes requires a "thumb interworking" branch instruction like bx <reg> or blx <relative address>, so every ARM-only instruction would require two extra branch instructions (except when multiple ARM instructions are back to back. Or for a less naive assembler, when there's only 1 or 2 Thumb instructions betwene ARM instructions, don't bother switching).

So correctness (but not performance) is achievable for straight-line decoding (although maybe clobbering lr if there isn't a Thumb interworking branch that takes a relative address without also setting LR as a return address). If it requires clobbering a register like lr, that's not even really correct vs. the asm as written. You'd have to consider lr as being like MIPS $at (assembler temporary) that the assembler can use as a scratch while expanding your source-code pseudo-instructions into multiple machine instructions.

Conditional branches, and jump tables, can all work, possibly using it eq / blxeq <target> or something to emulate a beq if the target instruction is ARM and the branch is in a Thumb-mode block. Jump tables can take label addresses as addr+1 for thumb mode. But that would mean you couldn't use tbb and tbh instructions at all, unless every target was also in Thumb mode, because they don't do interworking and they'd need a register to emulate correctly.

So the only thing you'd have real trouble doing correctly would be computed jumps where some targets are in different modes. (Like add r0, pc, r1 / b r0). The assembler wouldn't be able to generate code to fix up the the address. So it's possible to write code that would defeat an attempt to use Thumb as much as possible.

Of course all of this is a non-starter for performance reasons, even if it was possible without clobbering lr, so working out the limits of achievable correctness has been a fun though experiment in silly computer tricks. :P

Doesn't hurt to ask, but turns out there's a good reason why you didn't find anything.

I thought what OP wanted was that if a non-Thumb instruction was encountered, the assembler would start over and assemble the entire function in ARM mode. I doubt that this exists either (partly because how does an assembler know where a function begins and ends) but it's not unimaginable. — Nate Eldredge, Sep 29 '21 at 13:29
@NateEldredge: *ohhh*, yeah that could make sense, and on a closer read of the question does seem a more likely interpretation. But yeah good point that it would depend on a MASM-style `proc` / `endp` model, not just labels at the top of functions. You could introduce a new directive like `.auto_func` (vs `.thumb_func` and `.arm_func`), and treat any of those three as boundaries between functions. — Peter Cordes, Sep 29 '21 at 13:34
one could write a script around the tools to look for an error and try arm mode. but I cant imagine why the tools would have a feature to do this for you — old_timer, Sep 29 '21 at 13:38
@PeterCordes Not sure if this suffices since some instructions are thumb-only. So worst case, there is no mode in which the function can be assembled. — fuz, Sep 29 '21 at 13:42
@fuz: Yeah, just like not-encodeable instructions, the assembler would have to tell you about not-encodeable functions. Seems reasonable. Thanks for pointing out that concern, though. Added that to my answer. — Peter Cordes, Sep 29 '21 at 13:57
@NateEldredge yeah, the intention was not to switch modes on the fly, but per-function. I hadn't thought about lacking proper function delimiters, but that makes sense. You're .auto_func is interesting and could simply scope until the end of the file or the next .thumb_fund or .arm_func. — Renee Cousins, Sep 29 '21 at 16:50
@old_timer, ah this is a good idea; I'd need to split out each function into its own source file and have something in the makefile retry if the first attempt fails. That will make for a LOT of files, lol, but I like the idea — Renee Cousins, Sep 29 '21 at 16:51
@ReneeCousins: After detecting on a per-function basis, the code that came from one source file can be combined back into a single source file with `.thumb_func` and `.arm_func` directives between each function. i.e. write a tool that does this for the functions of a single source file, then use it separately on each of your `.S` files. (including putting `.syntax unified` at the top of each file, probably.) So you just have some changes within each file, so the diff from this is not nearly as intrusive. — Peter Cordes, Sep 29 '21 at 17:27

Dynamic assembly in THUMB or ARM mode

1 Answers1