0

First attempt at ARM64 (apple M1) assembly coding. Have basic 'hello world' code which assembles and runs correctly but when I run it in lldb, only the first three lines are displayed in full source code format like this:

Abenaki:hello jiml$ ~/llvm/clang+llvm-15.0.2-arm64-apple-darwin21.0/bin/lldb hello
(lldb) target create "hello"
Current executable set to '/Users/jiml/Projects/GitRepos/ARM/hello/hello/hello/hello' (arm64).
(lldb) b main
Breakpoint 1: where = hello`main + 4, address = 0x0000000100003f7c
(lldb) r
Process 5017 launched: '/Users/jiml/Projects/GitRepos/ARM/hello/hello/hello/hello' (arm64)
Process 5017 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f7c hello`main at hello.s:19
   16   
   17   _main:
   18       mov     x0, #0x0            // stdout
-> 19       adrp    x1, msg@PAGE        // pointer to string
   20       add     x1, x1, msg@PAGEOFF
   21       ldr     x2, =msg_len        // bytes to output
   22       mov     x16, #0x04          // sys_write
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
(lldb) 

After three steps, the display reverts to bare object code like this:

(lldb) s
Process 5017 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x0000000100003f88 hello`main + 16
hello`main:
->  0x100003f88 <+16>: mov    x16, #0x4
    0x100003f8c <+20>: svc    #0x80
    0x100003f90 <+24>: adrp   x1, 1
    0x100003f94 <+28>: mov    x2, #0x0

dwarfdump -a shows that all source lines are present in the .o; same behavior for .dSYM assembly. Using the 'list' command in lldb however displays all source lines correctly.

Is this a known issue for LLVM (clang, lldb) development? Any help appreciated...

I have tried LLVM version 14 and 15, same behavior, searched for similar issues but no help.

I did find this https://stackoverflow.com/questions/73778648/why-is-it-that-assembling-linking-in-one-step-loses-debug-info-for-my-assembly-s but it did not solve my issue.

  • The source lines might all be present, but their range might not cover all of main, either because some code was just compiler generated but not associated with a specific source line, or because of compiler bugs - particularly if you aren't building at -O0. You can see the source map with all the ranges by using `source info -f hello.s`. You can ask lldb what it knows about a particular address with `image lookup -va
    `. That might give some insight into why this code doesn't have an associated source line number.
    – Jim Ingham Dec 03 '22 at 01:39
  • If that still seems wrong, it's best to file a report with the llvm bug reporter: https://github.com/llvm/llvm-project/issues/. Be sure to include your input .s file and your compile line and the versions of clang & lldb you are using. – Jim Ingham Dec 03 '22 at 01:41

1 Answers1

0

So I think I have this resolved but not sure if it is actual compiler bug.

I wrote hello world in C, compiled and confirmed complete source display in lldb. I then reran clang with -S to generate the assembler source.

I then assembled that source...

clang -g -c -o hello.o hello.s
clang -o hello hello.o -lSystem -arch arm64

and confirmed it also runs in lldb with complete source display. Then I moved my hand written code line-by-line in order to figure out where the problem occurs. Seems my string data and length calculation is problematic. In the data section I originally had:

msg: ascii "Hello ARM64\n"
msg_len = . - msg

Coming from Intel world this seems perfectly natural ;-) Adding that length calculation caused some sort of corruption of the debug data. However, the executable has a proper OSO statement pointing at hello.o (nm -ap hello) and further the object file has references for all source statements in the source file (dwarfdump --debug-line hello.o) but still doesn't display source code after the third step. Curious that 'source info -f hello.s' within lldb only listed four lines.

I found three work-arounds. First adding a label between the two statements seems to allow correct behavior:

msg: ascii "Hello ARM64\n"
nothing:
msg_len = . - msg

Second, using equate:

msg: ascii "Hello ARM64\n"
.equ msg_len, . - msg

Third, using two labels:

msg: ascii "Hello ARM64\n"
msg_end:
msg_len = msg_end - msg

Will file report with llvm and see what they say.