0

I'm trying to come up with a slick way of generating a symbol table from my compiled binary.

I'm generally working in embedded with a fully featured GNU toolchain, though I am open to using system utilities (preferably Windows/MSYS2/Cygwin) to assist. My scripting language of choice is python as this is the language generally used within the company for which I work.

For reference, the following post from ~4 years ago is almost exactly what I am looking for, and I was hoping that given a significant amount of time has passed, there has to be a simpler way to achieve this.

Extract detailed symbol information (struct members) from elf file compiled with ARM-GCC

I'm quite familiar with gdb and am used to using info variables, p &name, ptype name, etc. What I ultimately need is an input/output that looks something like below. I'll need to support all structs, unions, enums and deep nesting of types as well (structs within structs within structs). I'm ok with stripping off all other decorations like static, volatile, atomic, etc. I'm not sure yet what I want to do with pointers, but I suppose it'd be nice to append an asterisk to the type in the CSV output below.

Sample Code

uint64_t myU64;
int64_t my64;

typedef struct {
    uint8_t aaa;
    int8_t bbb;
} myStruct2_t;

struct {
    uint32_t a;
    int32_t b;
    float c;
    enum {
        E_ONE = 100,
        E_TWO = 200,
        E_THREE = 300
    } myEnum;
    union {
        uint16_t aa;
        int16_t bb;
    } myUnion;
    myStruct2_t myStruct2[3];
    uint32_t myArr[2];
} myStruct;

Desired Output

myU64, 0x8001918, uint64_t
my64, 0x800191C, int64_t
myStruct.a, 0x8001920, uint32_t
myStruct.b, 0x8001924, int32_t
myStruct.c, 0x8001928, float
myStruct.myEnum, 0x800192C, int16_t <-- Requires deeper digging for enum
myStruct.myUnion.aa, 0x800192E, uint16_t
myStruct.myUnion.bb, 0x800192E, int16_t
myStruct.myStruct2[0].aaa, 0x8001930, uint8_t
myStruct.myStruct2[0].bbb, 0x8001931, int8_t
myStruct.myStruct2[1].aaa, 0x8001932, uint8_t
myStruct.myStruct2[1].bbb, 0x8001933, int8_t
myStruct.myStruct2[2].aaa, 0x8001934, uint8_t
myStruct.myStruct2[2].bbb, 0x8001935, int8_t
myStruct.myArr[0], 0x8001938, uint32_t
myStruct.myArr[1], 0x800193C, uint32_t

Using the gdb command examples I listed above, I can get all this information, but it would require me to write an extremely sophisticated string parser. Any ideas? Tools that exist or an easy way to automate this? I'm ok with having to create a tool, but so far my ideas require a string parsing monstrosity. I've looked briefly into the python/gdb API, but haven't seen examples that are very applicable, but maybe that is a route I could take too.

Also, while my focus has been to use gdb, I'm open to any other tool that can assist.

Thanks!

Ben S
  • 13
  • 2
  • [libelf](https://directory.fsf.org/wiki/Libelf) – KamilCuk Dec 27 '18 at 19:08
  • Offtopic, pure and simple. `Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.` – SergeyA Dec 27 '18 at 19:21

1 Answers1

1

slick way of generating a symbol table from my compiled binary.

Your compiled binary already has a symbol table, and what you are trying to generate has nothing to do with what is normally a symbol table, creating unnecessary confusion.

What you are looking for is a description of debug info in non-standard format (the standard format is DWARF, which is what GDB reads to produce output from ptype).

To read DWARF debug info programmatically, use libdwarf.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks, I'll give it a look. I wrote a program last night using libelf like another person had suggested, but couldn't dive very deep on the data types. – Ben S Dec 28 '18 at 16:46
  • After a bit of digging, it does appear that libdwarf has all of the information I need. However, I feel using the library is much more complex than performing string parsing on gdb's output. My algorithm is to gather the symbols via `info variables`, and recursively call `ptype` on everything. For the moment, I'm throwing out symbols that are multidimensional, deep nested pointer (int ***), or function pointers. It appears that the amount of work put into a tool like gdb is much more than I anticipated, and I just have to put my own custom twist on it. Thanks! – Ben S Dec 28 '18 at 21:29