0

For the following code, after being compiled by GCC, is it possible to derive the length of the "teststr" and "testarray" variables from DWARF?

void func(char *str1, char *str2){
     ...
     ...
     return;
}

int main(void){
     char *teststr = "123456";
     char testarray[6] = "123456";

     func(teststr, testarray);     
     ....
     ....

     return 0;
}
Qi Zhang
  • 631
  • 1
  • 7
  • 15

1 Answers1

0

DWARF is pretty much a representation of what the compiler knows about the source program. So the general answer to this question is: if you can find the size "locally" in the source, then yes; otherwise no.

But in this case if we remove the ...s from your program and compile it, we can just read the DWARF directly using readelf.

Here's what testarray in main looks like:

 <2><5c>: Abbrev Number: 3 (DW_TAG_variable)
    <5d>   DW_AT_name        : (indirect string, offset: 0x5): testarray
    <61>   DW_AT_decl_file   : 1
    <62>   DW_AT_decl_line   : 7
    <63>   DW_AT_type        : <0x7f>
    <67>   DW_AT_location    : 2 byte block: 91 60  (DW_OP_fbreg: -32)
...
 <1><7f>: Abbrev Number: 7 (DW_TAG_array_type)
    <80>   DW_AT_type        : <0x78>
    <84>   DW_AT_sibling     : <0x8f>
 <2><88>: Abbrev Number: 8 (DW_TAG_subrange_type)
    <89>   DW_AT_type        : <0x8f>
    <8d>   DW_AT_upper_bound : 5
 <2><8e>: Abbrev Number: 0
 <1><8f>: Abbrev Number: 6 (DW_TAG_base_type)
    <90>   DW_AT_byte_size   : 8
    <91>   DW_AT_encoding    : 7    (unsigned)
    <92>   DW_AT_name        : (indirect string, offset: 0x60): sizetype

That is, it is an array of 6 characters. So in this case you can find the length -- exactly what you'd expect from reading the source.

However teststr and the variables in func look more like:

 <2><4e>: Abbrev Number: 3 (DW_TAG_variable)
    <4f>   DW_AT_name        : (indirect string, offset: 0xf): teststr
    <53>   DW_AT_decl_file   : 1
    <54>   DW_AT_decl_line   : 6
    <55>   DW_AT_type        : <0x72>
    <59>   DW_AT_location    : 2 byte block: 91 68  (DW_OP_fbreg: -24)
...
 <1><72>: Abbrev Number: 5 (DW_TAG_pointer_type)
    <73>   DW_AT_byte_size   : 8
    <74>   DW_AT_type        : <0x78>
 <1><78>: Abbrev Number: 6 (DW_TAG_base_type)
    <79>   DW_AT_byte_size   : 1
    <7a>   DW_AT_encoding    : 6    (signed char)
    <7b>   DW_AT_name        : (indirect string, offset: 0x7d): char

This says it is just a pointer-to-char -- in other words, the length isn't statically known. A debugger like gdb can find the length at runtime by reading memory from the inferior, just like your program would have to.

Tom Tromey
  • 21,507
  • 2
  • 45
  • 63
  • Thanks! Can you explain how can we tell testarray is an array of 6 characters from the above DWARF info? If I guess it correctly, we can tell it from the "DW_AT_upper_bound". Then my question is if there are multiple arrays in main, how can we match different "DW_AT_upper_bound" with their corresponding array respectively? – Qi Zhang Oct 08 '16 at 02:46
  • 1
    It's all based on the type. First you can find the entry for the variable `testarray` - DIE 0x5c. Then look at the type of the array, DIE 0x7f. The type is an array type, so look at the bounds. The low bound is 0 (it isn't explicit but you can know it based on the CU's language) and the high bound is 5 -- so the length is 6. Other arrays in the function would have their own types with their own bounds. – Tom Tromey Oct 09 '16 at 02:00