0

I have an Elf class which reads in Elf headers and constructs various data structures; I am using mmap and mmap the file to a variable maddr in the above class. A pointer to this class is passed to a DwarfSymTab class (which relies on a Dwarf class which does the basic read of dwarf sections) but in the constructor for this class, maddr has suddenly changed in value from when it was first initialized.

I am using dbx to watch this change, but dbx is complaining about a variable out of scope which is quite definitely defined in the class. Following are some extracts of the class definitions and the dbx output.

class Elf {
    friend  class DwarfSymTab;
    friend  class SymTab;
    friend  class Dwarf;
    friend  class HostFunc;
    int fd;
    ElfEhdr ehdr;
    ElfEhdr *h;
    uchar   magic[4];
    ElfSect *symtab;
    ElfSect *symstr;
    ElfSect *dynsym;
    ElfSect *dynstr;
    ElfSect *bss;
    ulong   dynamic;
struct  stat    st;
    uchar   *maddr;
    . . .
};

class DwarfSymTab : public SymTab {
private:
    Elf     *elf;
    Dwarf       *dwarf;
    DwarfRec    *r;
    DwarfUType  *dt;
    DwarfLine   *dline;
    MemLayout   memlayout;
    . . .
};

Localproc *HostFunc::coreopen(int corefd, int stabfd){
    . . .
    e[STAB] = new Elf();
    if(e[STAB]->fdopen(stabfd) < 0){
        fprintf(stderr, "Elf::open: %s\n", e[STAB]->perror());
        exit(1);
    }

stopped in HostFunc::coreopen at line 858 in file "hostfunc.cc"
  858           e[CORE] = new Elf();
(dbx) print e[0], e[0]->maddr
e[0] = 0x81b26bc
e[0]->maddr = 0xfe6ce000 "^?ELF^A^A^A^F^A"

DwarfSymTab::DwarfSymTab(Core* c, Elf *e, SymTab *i, long reloc)
 :SymTab(c, e, i, reloc) {
    elf = e;
    r = new DwarfRec;
}

stopped in DwarfSymTab::DwarfSymTab at line 30 in file "dwarfsymtab.cc"
   30           elf = e;
(dbx) next
stopped in DwarfSymTab::DwarfSymTab at line 31 in file "dwarfsymtab.cc"
   31           r = new DwarfRec;
(dbx) print e, e->maddr
e = 0x81b26bc
e->maddr = 0x69727473 "<bad address 0x69727473>"
(dbx) print elf
dbx: "elf" is not defined in the scope `pi`dwarfsymtab.cc`DwarfSymTab::DwarfSymTab(Core*,Elf*,SymTab*,long)`

(dbx) print e->fd
e->fd = 6

You can see that the pointer to the Elf class is in tact, and that maddr has been trashed in some way while for example, the fd member is fine (this is the fd of the file from which Elf data is being read ... I have omitted the dbx output showing this to be the case but it is so).

Can anyone explain what might be going on?

Postscript: I have tracked down the change in 'maddr' to a call to a function. This is how dbx shows it. I first stop in the following function, and I've arranged to pass the pointer to the Elf class as an argument:

void SymTab::read(Elf *e){
    const char *error;

    trace( "%d.read()", this ); VOK;
    trace( "symtab modified %d", modtime() );
    _root = 0;
    if( error = gethdr(e) )
        _warn = sf( "symbol table header: %s; go on", error );

stopped in SymTab::read at line 164 in file "symtab.cc"
  164           trace( "%d.read()", this );     VOK;
(dbx) where
=>[1] SymTab::read(this = 0x81b757c, e = 0x81b25fc), line 164 in "symtab.cc"
  [2] HostCore::open(this = 0x81b24a0), line 512 in "hostcore.cc"
  [3] HostProcess::open(this = 0x81b1c24, ischild = 0), line 318 in "hostcore.cc"
  [4] TermAction(parent = 0x818b294, obj = 0x81b1c24, pick = 0), line 160 in "term.cc"
  [5] TermServe(), line 238 in "term.cc"
  [6] PadsServe(n = 0), line 292 in "term.cc"
  [7] main(argc = 1, av = 0xfeffdd68), line 75 in "pi.cc"

At this point, the pointer to the memory-mapped region is correct. The fd is that of the file from which elf sections are being read:

(dbx) print e, e->fd, e->maddr
e = 0x81b25fc
e->fd = 7
e->maddr = 0xfe6ce000 "^?ELF^A^A^A^F^A"

Stepping over a few instructions: . . .

(dbx) next
stopped in SymTab::read at line 167 in file "symtab.cc"
  167           if( error = gethdr(e) )

and then stepping into the routine which reads the dwarf structures:

(dbx) step
stopped in DwarfSymTab::gethdr at line 36 in file "dwarfsymtab.cc"
   36           switch(elf->encoding()){
(dbx) where
=>[1] DwarfSymTab::gethdr(this = 0x81b757c, e = 0x81b25fc), line 36 in "dwarfsymtab.cc"
  [2] SymTab::read(this = 0x81b757c, e = 0x81b25fc), line 167 in "symtab.cc"
  [3] HostCore::open(this = 0x81b24a0), line 512 in "hostcore.cc"
  [4] HostProcess::open(this = 0x81b1c24, ischild = 0), line 318 in "hostcore.cc"
  [5] TermAction(parent = 0x818b294, obj = 0x81b1c24, pick = 0), line 160 in "term.cc"
  [6] TermServe(), line 238 in "term.cc"
  [7] PadsServe(n = 0), line 292 in "term.cc"
  [8] main(argc = 1, av = 0xfeffdd68), line 75 in "pi.cc"

At this point, maddr has clearly been trampled on, but nothing else:

(dbx) print e, e->fd, e->maddr
e = 0x81b25fc
e->fd = 7
e->maddr = 0x69727473 "<bad address 0x69727473>"

Since there is no malloc'ing going on across a function call (as far as I know), what could be causing this? 'gethdr' in the DwarfSymTab class is overrinding a virtual function in the base 'SymTab' class. I don't know if this has any bearing on how the function is called.

I apologize for the amount of detail, but as I pointed out, the problem can't be really simplified (or perhaps it's beyond my abilities).

Post-postscript: In fact the situation is worse than this because of the following. It is true that by stepping into the 'gethdr' function as above, the 'maddr' member of the Elf class is incorrect, but if I go back down the stack to the calling frame, everything is fine:

(dbx) next
stopped in SymTab::read at line 167 in file "symtab.cc"
  167           if( error = gethdr(e) )
(dbx) step
stopped in DwarfSymTab::gethdr at line 36 in file "dwarfsymtab.cc"
   36           switch(elf->encoding()){
(dbx) print e, e->fd, e->maddr
e = 0x81b25fc
e->fd = 7
e->maddr = 0x69727473 "<bad address 0x69727473>"
(dbx) up
Current function is SymTab::read
  167           if( error = gethdr(e) )
(dbx) print e, e->fd, e->maddr
e = 0x81b25fc
e->fd = 7
e->maddr = 0xfe6ce000 "^?ELF^A^A^A^F^A"

This simply doesn't make any sense to me.

N. Hunt
  • 51
  • 7
  • 3
    With all of that raw pointer usage, you're surprised that you have bugs? Second, post a [mcve]. – PaulMcKenzie Jul 14 '19 at 00:45
  • 2
    Such symptoms are usually a sign of some code doing something it shouldn't with a pointer - dereferencing a NULL, falling off the end of an array, using a dangling pointer (which points at an object that no longer exists), etc etc. Quite often, the cause of the problem is in code completely unrelated to the code where the symptom is seen. It is also quite common for the symptom not to appear immediately. Without a [mcve] it's virtually impossible to guess what the cause is - because it could be almost anything, and people aren't good at picking problems in code they don't see. – Peter Jul 14 '19 at 02:09
  • Thanks for your comments. Unfortunately given the nature of what I am working on, even a minimal working example is going to be far from trivial. The fact that the pointer is getting disturbed after a class instantiation makes me think that the allocator is involved. This program links against a library with an overriden 'new'. I am going to try and remove dependence on that to see if it has any effect. – N. Hunt Jul 14 '19 at 02:49
  • 1
    The value for `e->maddr` looks like ASCII text ("stri" for little endian architecture). Maybe you violate the One Definition Rule, and different source files have different ideas of how big one or more of your classes is? Is the `Elf *e` pointer passed to `DwarfSymTab` still valid, or is it dangling? There are too many unknowns. If possible, try using your debugger to set a hardware breakpoint to stop when `e->maddr` is changed. – 1201ProgramAlarm Jul 14 '19 at 06:10
  • What @1201ProgramAlarm said, and the ODR violation is sometimes a byproduct from an internally binary-incompatible build, itself a byproduct of broken build scripts/makefiles that can't deal with a build tree from before a project structure change (adding/renaming sources, etc.), or that don't capture all dependencies correctly. Thus: **delete** all build folders and build again. Do not use any sort of "clean" or "rebuild" from the IDE or from the makefile. And don't even think about doing in-source builds, it's always disastrous, even if it may superficially appear to work OK. – Kuba hasn't forgotten Monica Jul 14 '19 at 06:26
  • If you have an in-source build, I trust that you use revision control and can recreate a source-only checkout, with no build junk, and then build in a dedicated build folder. Do not fall into temptation of bypassing this: make sure you pushed the changes, move `.git` or `.svn` out, wipe the folder, move the dot folder back in, check out. – Kuba hasn't forgotten Monica Jul 14 '19 at 06:29
  • I have solved this problem by not using mmap to map a file for reading. I have resorted to simple lseek/read operations for the various structures I am interested in and all the previous problems have gone. It was very frustrating because I was witnessing unexplained scope problems with variables which were supposed to be in the class hierarchy amongst other things. I imagine mmap and C++ don't go together well. – N. Hunt Jul 15 '19 at 05:23
  • In reply to 1201ProgramAlarm, I did try to set a watch on e->maddr, but dbx wasn't helping very much. Perhaps the aforementioned scope problems were the reason, I just don't know. Many thanks to all for the various suggestions. – N. Hunt Jul 15 '19 at 05:25

0 Answers0