1

When I tried to get the content of .debug_line section, I used the commands like decodedline to get the readable format. But when I tried to dig into the details of the results, I could not understand the reason why there would be single line number mapped to multiple starting address. What should we identify this starting address as?

        File name                            Line number    Starting address    View    Stmt
    bof.c                                          6              0x1189               x
    bof.c                                          6              0x1189       1
    bof.c                                          7              0x118d               x
    bof.c                                          7              0x118d       1
    bof.c                                          8              0x1192        
    bof.c                                         10              0x1193               x
    bof.c                                         10              0x1193       1
    bof.c                                         10              0x119d       

The above is the result of readelf --debug=decodedline ./bof. Following are the source code and the corresponding assembly language (intel) of starting address.

Source code of bof:

1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4 #include <stdbool.h>
5
6 bool check(int lenofbuf, int input) {
7     return input <= lenofbuf ? true : false;
8 }
9 
10 int main(int argc, char** argv) {
11     if (argc != 2) {
12         printf("Arguments: <buffer input>\n");
13         exit(1);
14     }
...

Assembly language:

0000000000001189 <check>:
check():
/home/xxx/Desktop/angr/research/bof/bof1-afterpatch/./bof.c:6
    1189:   f3 0f 1e fa             endbr64 
/home/xxx/Desktop/angr/research/bof/bof1-afterpatch/./bof.c:7
    118d:   39 fe                   cmp    esi,edi
    118f:   0f 9e c0                setle  al
/home/xxx/Desktop/angr/research/bof/bof1-afterpatch/./bof.c:8
    1192:   c3                      ret    

0000000000001193 <main>:
main():
/home/xxx/Desktop/angr/research/bof/bof1-afterpatch/./bof.c:10
    1193:   f3 0f 1e fa             endbr64 
    1197:   50                      push   rax
    1198:   58                      pop    rax
    1199:   48 83 ec 18             sub    rsp,0x18
    119d:   64 48 8b 04 25 28 00    mov    rax,QWORD PTR fs:0x28
    11a4:   00 00 
    11a6:   48 89 44 24 08          mov    QWORD PTR [rsp+0x8],rax
    11ab:   31 c0                   xor    eax,eax
/home/xxx/Desktop/angr/research/bof/bof1-afterpatch/./bof.c:11
    11ad:   83 ff 02                cmp    edi,0x2
    11b0:   75 49                   jne    11fb <main+0x68>
...

As for the example above, line number 10 is mapped to 0x1193 and 0x119d. Can anyone help me to explain the reason for this? Thanks.

Rafael
  • 45
  • 2
  • 10
  • 1
    Most C statements take multiple asm instructions to implement, especially in a debug build where results have to get stored back into memory instead of kept in registers. So it makes total sense that multiple instructions (each with their own address) would map to the same source line. You can also look at source + asm on https://godbolt.org/. Apparently there's debug info for each instruction separately, or that's how it gets presented by `decodedline`. – Peter Cordes Jun 27 '21 at 00:51
  • 1
    Apparently the `119d` is after the function prologue while the `1193` is the actual first instruction of the function. The `119d` is probably for the opening brace. Try putting that on a separate line and see if that changes anything. – Jester Jun 27 '21 at 00:54
  • @PeterCordes Thanks for your response. Yes, it makes total sense that multiple instructions map to the same line of source code. However, my question is that why `1193` and `119d` are chosen from addresses as starting addresses? Why not choose all addresses including `1197`, `1198` as starting addresses? Is there any reason for it? Thanks. – Rafael Jun 27 '21 at 02:41
  • Oh, I see, not every instruction has a separate entry after all. Probably Jester's right that different parts of the prologue are treated as separate blocks when generating debug info, and they just happen to be on the same C source line. What compiler / version / options generated this asm (and debug info) anyway? `push rax` / `pop rax` is completely useless and redundant, and not something GCC or clang normally generate even in debug mode. (It's also weird that function starts aren't aligned to 16-byte boundaries, but `gcc -falign-functions=0` will do that.) – Peter Cordes Jun 27 '21 at 02:47
  • @PeterCordes Yes, I also found that asm generated not in debug mode made more sense. However, we are not able to test the line number results of the one not in debug mode because it didn't have debug section. – Rafael Jun 27 '21 at 03:18
  • Debug info and optimization are orthogonal, at least with normal compilers like GCC and clang. Just use `-g -O2`. That's how https://godbolt.org/ matches up source lines with asm lines, and that works even with optimization enabled. (Each C statement doesn't necessarily map to a single block of instructions in that case, and there are some C statements with no associated asm instructions, but every asm instruction can be mapped to a source line.) – Peter Cordes Jun 27 '21 at 03:40
  • @PeterCordes I just found the answer in the documentation of DWARF. Each row in line number table consists of opcodes mapping to specific instruction set. There are even a formula to calculate the relationship between opcodes, line increments, address increments, and so on. Therefore, the line increments might be 0 with specific values on address increments and opcodes. I think that is the reason why there could be duplicated source lines mapped to single addresses. If you are interested, you can get documentation from [here](http://dwarfstd.org/Dwarf3Std.php) and turn to section 6.2.5. – Rafael Jul 01 '21 at 01:41

0 Answers0