0

I think I have the correct understanding of my question, but as I'm going purely with documentation and intuition, I was hoping someone with more expertise than me could verify or correct my understanding.

The closest question I was able to find in StackOverflow was this:

How to get struct member offset from dwarf info?

this doesn't quite answer my question, as this is asking about the implementation of how to obtain the offset and doesn't explain anything regarding how to interpret the information (also, I'm using pyelftools, which is a bit different than the proposed solution in that post).


Example code that I'm working is shown below:

#include <stdio.h>
// hello.c
struct myStruct {
    int myNum;
    char myLetter;
    int myNum2;
};

struct badStruct {
  int testingNum;

};
void foo() {
  struct badStruct bad;
  bad.testingNum = 17;
  struct myStruct s2;
  s2.myNum2 = 27;
  s2.myNum = 10;
}

int main()
{
    struct myStruct s1;
    s1.myNum = 0;
    s1.myNum2 = 17;
    s1.myLetter = 'a';
    int non_struct = 71;
    printf("Hello World %d %d\n", s1.myNum2, non_struct);
    return 0; 
}

This is compiled with the following command: gcc -save-temps -masm=intel -gdwarf-2 -O0 hello.c -o hello.out

Following is the disassembly code provided in intel format (I am only including useful information w/o debug info):

main:
    endbr64
    push    rbp
    mov rbp, rsp
    .cfi_def_cfa_register 6
    sub rsp, 16
    mov DWORD PTR -12[rbp], 0
    mov DWORD PTR -4[rbp], 17
    mov BYTE PTR -8[rbp], 97

Using the objdump tool with such command: objdump --dwarf=info ./hello.out, we get a bunch of helpful information shown below:

 <1><71>: Abbrev Number: 5 (DW_TAG_structure_type)
    <72>   DW_AT_name        : (indirect string, offset: 0x125): myStruct
    <76>   DW_AT_byte_size   : 12
...
 <2><7e>: Abbrev Number: 6 (DW_TAG_member)
    <7f>   DW_AT_name        : (indirect string, offset: 0x15d): myNum
...
    <8a>   DW_AT_data_member_location: 2 byte block: 23 0      (DW_OP_plus_uconst: 0)
 <2><8d>: Abbrev Number: 6 (DW_TAG_member)
    <8e>   DW_AT_name        : (indirect string, offset: 0xc1): myLetter
...
    <99>   DW_AT_data_member_location: 2 byte block: 23 4       (DW_OP_plus_uconst: 4)
 <2><9c>: Abbrev Number: 6 (DW_TAG_member)
    <9d>   DW_AT_name        : (indirect string, offset: 0x113): myNum2
...
    <a8>   DW_AT_data_member_location: 2 byte block: 23 8       (DW_OP_plus_uconst: 8)

So using the DWARF2 documentation (page 19 and page 42) provided here (https://dwarfstd.org/doc/dwarf-2.0.0.pdf), for myStruct structure, DW_AT_byte_size is 12, which means for all functions that allocate the struct myStruct object, it will subsequently allocate the space in the stack frame with the offset of -12 (e.g., rbp-12)?

DW_AT_byte_size is described as:

If the size of an instance of the structure type, union type, or class type entry can be determined statically at compile time, the entry has a DW_AT_byte_size attribute, whose constant value is the number of bytes required to hold an instance of the structure, union, or class, and any padding bytes.


Then each member variable offset can be found by using the DW_OP_plus_uconst, which is described as

A structure member is four bytes from the start of the structure instance. The base address is assumed to be already on the stack.

Therefore, for member variable myNum, it will be rbp-12 + 0, which will result in rbp-12 that is reflected in the assembly code mov DWORD PTR -12[rbp], 0. Same thing for myNum2, it will be rbp-12 + 8, which will result in rbp-4, which will be mov DWORD PTR -4[rbp], 17.

Is my understanding of how dwarf information + struct work correct? If anything is incorrect, please let me know, and I will look more into it; I tried searching through many places to learn how struct works in assembly and DWARF information to reach my conclusion, but I could be missing something, or my description of something is wrong above.


Edit: I apologize; I forgot to add the example for the bolded statement I put above, so for the foo function, I get a disassembly file that looks like this:

foo:
    endbr64
    push    rbp
    mov rbp, rsp
    mov DWORD PTR -16[rbp], 17
    mov DWORD PTR -4[rbp], 27
    mov DWORD PTR -12[rbp], 10

So using the same offset information, I found using DWARF above, myNum2 is still going to be rbp-4, which is reflected in mov DWORD PTR -4[rbp], 27 even in a different function foo.

As you can see from this example, I added another struct object called badStruct above myStruct object, and its member variable testingNum gets the next available offset which is rbp-16 (as myStruct size is 12). The only problem that I found regarding this was that in the DWARF information

 <1><ac>: Abbrev Number: 5 (DW_TAG_structure_type)
    <ad>   DW_AT_name        : (indirect string, offset: 0x109): badStruct
    <b1>   DW_AT_byte_size   : 4
 <2><b9>: Abbrev Number: 6 (DW_TAG_member)
    <ba>   DW_AT_name        : (indirect string, offset: 0x11a): testingNum
    <c5>   DW_AT_data_member_location: 2 byte block: 23 0       (DW_OP_plus_uconst: 0)

It seems like I need to continuously add DW_AT_byte_size as I propagate through the list of structs, but I think that's a minor concern as it seems to provide that information in an orderly fashion.

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
Jay
  • 373
  • 1
  • 10
  • The text in bold is a bit misleading. There's nothing magic about the number 12. That could be any number that's at least 12. You should be able to demonstrate that by moving the `non_struct` variable declaration to the top of `main`. Or just add a second struct, since only one of them can be at `rbp-12` – user3386109 Aug 14 '23 at 19:55
  • @user3386109 thank you for your reply; I have added an example in the edit to provide an example for my bolded statement. Hopefully, that's what you meant. – Jay Aug 14 '23 at 20:06

0 Answers0