10

[Deep breath.] We have an application that pops up a window using WxMotif 2.6.3 (the GUI library was not - and is not - my choice). It runs fine on the 32-bit ix86 systems. I had the task of converting it to a 64-bit application. It always seg-faults. I'm on RHEL 6, so I compiled using gcc 4.4.7. After much gnashing of teeth, the problem seems apparent: in wxFrame::DoCreate, m_mainWidget is set (correctly); in wxFrame::GetMainWidget, it is returned as a null pointer. The null pointer results in the crash. Using gdb, the instruction that sets m_mainWidget is

mov    %rax,0x1e0(%rdx) # $rdx = 0x68b2f0

whereas the code that gets m_mainWidget is

mov    0x1f0(%rax),%rax # $rax = 0x68b2f0

In gdb, I can examine the memory and see that the pointer at 0x68b4d0 is correct. Why is the offset incorrect?

To confuse things even more, when I use objdump to disassemble libwx_motifd_core-2.6.so.0.3.1, the "get" assembly is

  mov    0x1e0(%rax),%rax

In objdump, both the get and the set use 0x1e0 as the offset. What is going on?

I've uploaded some relevant info here: GitHub

I've included a small program that replicates the problem on my system.

Investigating further, I see in the disassembly of wxFrame::DoCreate, that further uses of m_mainWidget retrieve the value using 0x1e0 as the offset (The disassembly is on a compile where I used -O0, so the code has to go back to the memory each time). "Just for Fun," I added a new member variable to wxFrame - m_myMainWidget - and set it right after m_mainWidget was set. I then had wxFrame::GetMainWidget() return the local value (m_myMainWidget). Wouldn't you know it: The crash still occurs and GetMainWidget contains the same +16 offset when I disassemble from within gdb. (The offset is not there where I use objdump to disassemble.)

VividD
  • 10,456
  • 6
  • 64
  • 111
John
  • 2,326
  • 1
  • 19
  • 25
  • Could this be a difference in compiler optimization levels? – BlackVegetable Jul 18 '14 at 17:24
  • same behavior whether I use -O2 or -O0. – John Jul 18 '14 at 17:26
  • 1
    Could it be a (dynamic) linking problem? – BlackBear Jul 18 '14 at 17:27
  • not knowing wxMotif, is their code between the get and the set? – DTSCode Jul 18 '14 at 17:29
  • 3
    Somewhere, you have two translation units, or two modules, built with different compiler settings or different macro definitions. As a result, these two modules don't agree on the binary layout of the class. E.g. `class MyWidget { MyInt a; MyInt b; };` If `MyInt` is, say, typedef'ed as 32-bit integer sometimes and 64-bit other times, then the offset to `MyWidget::b` would be different. – Igor Tandetnik Jul 18 '14 at 17:30
  • Can you post some more context around the instruction? Are instructions around correct or are they corrupt too? – Matteo Italia Jul 18 '14 at 17:30
  • 4
    Looking at https://github.com/tagged/wx/blob/master/include/wx/frame.h , there are macros that conditionally add or remove class members. That's scary. Be very very careful that you are defining these macros the same way in your application as they were defined when the library was built. – Igor Tandetnik Jul 18 '14 at 17:37
  • I posted stuff to https://github.com/hendrixjl/wxmotif_problem – John Jul 18 '14 at 17:45
  • @Igor: The "get" and "set" are both in the same translation unit. Sigh. – John Jul 18 '14 at 18:34
  • Maybe they are not in the same translation unit after all. The "get" is in an inline function in the declaration of the wxFrame class. – John Jul 21 '14 at 14:05

1 Answers1

2

Based on @Igor's comment, I have looked at the class layouts using the -fdump-class-hierarchy compiler option. It turns out that there is indeed a vtable layout mismatch, due to this conditional block in include/wx/app.h:

#ifdef __WXDEBUG__
    virtual void OnAssert(const wxChar *file,
                          int line,
                          const wxChar *cond,
                          const wxChar *msg);
#endif // __WXDEBUG__

You need to make sure you compile your code with the same __WXDEBUG__ setting.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • I think the answer is here somewhere. I would have said that both calls were in the same translation unit, but I just moved the function definition for wxFrame::GetWidget into the ./src/motif/frame.cpp file and recompiled, and the problem seems to be fixed. I'm not sure how *any* of the macros (like __WXDEBUG__) could have been different from one translation unit to the next, since I compiled the all using ./configure; make; make install without changing it. Thanks for the help! – John Jul 21 '14 at 13:37
  • 1
    The macro was not defined while compiling your code, not the library. Your `MainApp` inherits from `wxAppConsole`, but without `WXDEBUG` its vtable is missing the entry for `OnAssert` and thus has different layout than what the library expects (because that was compiled with `WXDEBUG`). – Jester Jul 21 '14 at 13:44
  • [Hitting myself in the head] How stupid of me! Thanks a bunch. You've given me a new tool to use (-fdump-class-hierarch). – John Jul 21 '14 at 18:23