2

I'm modifying an assembler/linker to add an ARM7a backend. To interwork with a call (BL) there is BLX so if I see a global symbol with bit0 set I know to switch a BL to a BLX instruction.

But for a branch there is no BX available for an immediate (relative displacement), only a register located address.

I don't see a single instruction alternative and so I'm using a veneer - branch to a LDR r0, [pc + offset] which loads in the symbol address from a .word, then BX r0.

But I can't believe this is really necessary for what must be a very common operation - branching to a function that happens to be written in thumb code. Linkers must be coping somehow with object code containing generic B instructions that turn out to be destined for thumb mode code.

So my question is: how are linkers handling this?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
progman
  • 71
  • 3
  • 2
    That is exactly what gnu ld is doing. Have you looked at it? I don't think it is a terribly common operation, you usually `bl` functions except when using tail call. – Jester Jul 18 '22 at 00:23
  • You're looking for a tailcall branch-without-link with interworking? Other than tailcalls to other functions, you wouldn't normally jump to a different mode. (Especially in ARMv7a where you have Thumb2 so you can use 32-bit instructions when useful, so a hot loop doesn't have to suffer from limited instructions.) Note that `r0` holds the first function arg, so it's the least usable choice here. If there's a call-clobbered register other than `lr` and `r0..3`, use that. – Peter Cordes Jul 18 '22 at 00:26
  • 3
    Okay so gnu ld isn't *exactly* doing that since it uses the `ip` register which is specifically reserved for this purpose. – Jester Jul 18 '22 at 00:27
  • why didnt you just try the linkers to see? – old_timer Jul 18 '22 at 14:34
  • did you mean armv7a? – old_timer Jul 18 '22 at 14:58
  • yes armv7a. I have embarrassed myself - indeed I should have thought of forcing the issue with a linker or two. But thanks for doing the leg work - it's a more complete investigation than I would have achieved ...So with this confirmation of there being a rather unsatisfactory workaround, is this a glaring deficiency in the instruction set? .... – progman Jul 20 '22 at 00:18
  • ... I'm limiting my assembler/linker to Arm mode to simplify things. It is implementing a backend to a functional language compiler and the very first non-trivial program I applied it to threw up this tailcall branch and it appears to be a very common operation. It is thus prohibitively costly doing this veneer and I wonder why there is no BX defined on a pc-relative immediate. The comments/answer look to be touching on this but I don't understand fully but I assume the lack of BX/imm is not considered a design error. – progman Jul 20 '22 at 00:18

1 Answers1

1

so.c:

void thumb_fun(void);
void arm_fun ( void )
{
    thumb_fun();
}

x.s:

.thumb

    bl arm_fun
    b .

.globl thumb_fun
.thumb_func
thumb_fun:
    bx lr

Build and disassemble:

Disassembly of section .text:

00001000 <thumb_fun-0x6>:
    1000:   f000 f80a   bl  1018 <__arm_fun_from_thumb>
    1004:   e7fe        b.n 1004 <thumb_fun-0x2>

00001006 <thumb_fun>:
    1006:   4770        bx  lr

00001008 <arm_fun>:
    1008:   e92d4010    push    {r4, lr}
    100c:   eb000003    bl  1020 <__thumb_fun_from_arm>
    1010:   e8bd4010    pop {r4, lr}
    1014:   e12fff1e    bx  lr

00001018 <__arm_fun_from_thumb>:
    1018:   4778        bx  pc
    101a:   e7fd        b.n 1018 <__arm_fun_from_thumb>
    101c:   eafffff9    b   1008 <arm_fun>

00001020 <__thumb_fun_from_arm>:
    1020:   e59fc000    ldr ip, [pc]    ; 1028 <__thumb_fun_from_arm+0x8>
    1024:   e12fff1c    bx  ip
    1028:   00001007    .word   0x00001007
    102c:   00000000    .word   0x00000000

If I link with --use-blx

Disassembly of section .text:

00008000 <thumb_fun-0x6>:
    8000:   f000 e802   blx 8008 <arm_fun>
    8004:   e7fe        b.n 8004 <thumb_fun-0x2>

00008006 <thumb_fun>:
    8006:   4770        bx  lr

00008008 <arm_fun>:
    8008:   ea000000    b   8010 <__thumb_fun_from_arm>
    800c:   00000000    andeq   r0, r0, r0

00008010 <__thumb_fun_from_arm>:
    8010:   e51ff004    ldr pc, [pc, #-4]   ; 8014 <__thumb_fun_from_arm+0x4>
    8014:   00008007    .word   0x00008007

I don't have a built llvm with linker right now that takes an eternity to build. I would assume it is similar.

I think as answered in comments that the abi reserves a register for things like these.

blx had issues in an early core if I remember right so the tools just did not use it.

My clang build for armv4t completed on this machine.

Disassembly of section .text:

000200e4 <thumb_fun-0x6>:
   200e4:   f000 e802   blx 200ec <arm_fun>
   200e8:   e7fe        b.n 200e8 <thumb_fun-0x2>

000200ea <thumb_fun>:
   200ea:   4770        bx  lr

000200ec <arm_fun>:
   200ec:   eaffffff    b   200f0 <__ARMv5ABSLongThunk_thumb_fun>

000200f0 <__ARMv5ABSLongThunk_thumb_fun>:
   200f0:   e51ff004    ldr pc, [pc, #-4]   ; 200f4 <__ARMv5ABSLongThunk_thumb_fun+0x4>
   200f4:   000200eb    .word   0x000200eb

To get a good llvm linker you need to build a cross tool not just take the prebuilt for your platform. The last couple or so major revs have had problems cross building anyway, so have resorted to gcc style build for a specific architecture and that resolved a lot of my clang/llvm problems. Other than having to build for each and the time it takes to build.

So I do not have an armv7-a that I have tried to build for llvm yet. I suspect you would get the same result. Note I did not try armv7a either for the gcc above. In part, what do the linkers do, and as you can see you could have easily done this yourself. But as answered in comments, it generates a trampoline or I guess folks call it a veneer.

llvm/clang armv7a:

Disassembly of section .text:

000200e4 <thumb_fun-0x6>:
   200e4:   f000 e802   blx 200ec <arm_fun>
   200e8:   e7fe        b.n 200e8 <thumb_fun-0x2>

000200ea <thumb_fun>:
   200ea:   4770        bx  lr

000200ec <arm_fun>:
   200ec:   eaffffff    b   200f0 <__ARMv7ABSLongThunk_thumb_fun>

000200f0 <__ARMv7ABSLongThunk_thumb_fun>:
   200f0:   e300c0eb    movw    ip, #235    ; 0xeb
   200f4:   e340c002    movt    ip, #2
   200f8:   e12fff1c    bx  ip

with gnu tools -march=armv7-a:

00001000 <thumb_fun-0x6>:
    1000:   f000 e802   blx 1008 <arm_fun>
    1004:   e7fe        b.n 1004 <thumb_fun-0x2>

00001006 <thumb_fun>:
    1006:   4770        bx  lr

00001008 <arm_fun>:
    1008:   ea000000    b   1010 <__thumb_fun_from_arm>
    100c:   00000000    andeq   r0, r0, r0

00001010 <__thumb_fun_from_arm>:
    1010:   e51ff004    ldr pc, [pc, #-4]   ; 1014 <__thumb_fun_from_arm+0x4>
    1014:   00001007    .word   0x00001007
halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • arm does not have memory operands for alu operations, etc. risc-like, load store-like this is not a surprise, ideally you stick in one mode or the other as much as you can and not a lot of back and forth, and that has a lot to do with the programmer doing the build. – old_timer Jul 19 '22 at 14:12