1

At the very first, apologies in advance for the very indistinct presentation of the question - if I knew what to ask for, I would probably know how to fix it. ... But I do not even have a faint theory, seven years of C++ experience notwithstanding. Any helpful pointers (hè) will be most appreciated.

  • Possibly related to this question. (same symptoms)
  • Not related to that one. (no explicit function pointers here)

Problem: A count of objects with certain properties, which themselves are checked by a member function of their class shows incorrect results. The source is that the check happens with gibberish values. The source of that is that the pointer "this" changes between calling the function and entering its body. Once the body is left, the pointer is again correct.

Sadly, the solution for the related question does not work here.

I am not able to produce a minimal example of the problem. Furthermore, literally hundreds of member functions are being called correctly in the same program, as far as I know without exhibiting this behaviour.

  • What can I do? As a bandaid, replacing the function call with a copy of its body works, but this is no proper solution.

I am at a complete loss as to how to proceed with my diagnosis.

Concise: What steps can I follow to attain greater insight into the nature of the problem?

A short checklist of things already taken care of:

  • The objects in question are properly initialised at the time of the call.
  • All optimisations are off. No inlining. This is a debug build with the appropriate settings in effect.
  • Cleaning and rebuilding the project has not yielded a different result.
  • Recompiling with the original (but retyped) function call after the bandaid solution had been tested successfully led to a return of the problem.
  • There are no compiler warnings in the compilation unit involved (warning level 3), specifically disabled project-wide are:
    • C4005 (macro redefinition, due to using a custom/hacked Windows SDK for compatibility reasons - this was originally a Win95 program)
    • C4244 (implicit cast to smaller type, due to legacy code waiting to be refactored - those are all float-to-int conversions that lack an explicit cast, all 800+ instances of them)
    • C4995 (calling function marked with #pragma deprecated, due to C-lib functions being called without preceding underscore - hoping to eventually switch to GCC)
  • "control flow guard" and "basic runtime checks" are enabled, but do not trigger.

And a hint that may or may not be relevant, but which I cannot interpret at the moment:

  • For the very first hex, IsSea is called normally, that is: "this" inside is identical to "this" outside
  • Only in all hexes that follow does the bug happen.
    • The altered "this" pointer does not point to the first hex though, it seems to hit unallocated space.

Here is an extract of how it looks like:

Header:

// These three are actually included from other files.
constexpr unsigned int AREA_none = 0u;
constexpr unsigned int AREA_Lake = 1u;
enum Terrain
{
  OCEAN = 8,
  TERRA_INCOGNITA = 10
};

class CHex
{
public:
  CHex(); // initialises ALL members with defined error values
  bool ReadHex(); // Init-type function calling the problematic one
  bool IsSea() const // problematic function
  { return this->_Area != AREA_none && this->_Area != AREA_LAKE && this->_nTerrain == Terrain::OCEAN; }
  // The body does the right thing - just WITH the wrong thing.
protected:
  unsigned int _Area;
  int _nNavalIndex;
  Terrain _nTerrain;

  static int _nNavalCount = 0;

  // There are a lot more functions in here, both public and protected.
  // The class also inherits a bunch from three other classes, but no virtual functions and no overlaps are involved.
}

Source:

CHex::CHex() : _Area{0u}, _nNavalIndex{0}, _nTerrain{Terrain::TERRA_INCOGNITA}
{}
bool CHex::ReadHex()
{
  // Calls a lexer/parser pair to procure values from several files.
  // _Area and _nTerrain are being initialised in this process.
  // All functions called here work as expected and produce data matching the source files.

  // _Area and _nTerrain have the correct values seen in the source files at this point.
  if(this->IsSea()) // but inside that it looks as if they were uninitialised
    // This ALWAYS happens because the function always returns true.
    _nNavalIndex = _nNavalCount++;
  // Stopping at the next instruction, all values are again correct
  // - with the notable exception of the two modified by the instruction that should not have happened.

  // If I replace that with the following, I receive the correct result:
  /*
  // direct copy of the function's body
  if(this->_Area != AREA_none && this->_Area != AREA_Lake && this->_nTerrain == Terrain::OCEAN)
    _nNavalIndex = _nNavalCount++; // only happens when it should; at the end, the count is correct
  */

  // Sanity checks follow here.
  // They too work correctly and produce results appropriate for the data.

  return true; // earlier returns exist in the commented-out parts
}

Sorry again for this big mess, but well, right now I am a mess. It's like seeing fundamental laws of physics change.

--

On advice from @Ben Voigt I hacked in a diagnostic that dumps the pointers into a file. Behold:

Before ReadHex: 20A30050 (direct array access) On ReadHex:  20A30050    On IsSea:   20A30050 (with members: 0, 8) After ReadHex:    20A30050 
Before ReadHex: 20A33EAC (direct array access) On ReadHex:  20A33EAC    On IsSea:   20A33EAC (with members: 2, 0) After ReadHex:    20A33EAC 
Before ReadHex: 20A37D08 (direct array access) On ReadHex:  20A37D08    On IsSea:   20A37D08 (with members: 2, 0) After ReadHex:    20A37D08 
Before ReadHex: 20A3BB64 (direct array access) On ReadHex:  20A3BB64    On IsSea:   20A3BB64 (with members: 3, 0) After ReadHex:    20A3BB64 
Before ReadHex: 20A3F9C0 (direct array access) On ReadHex:  20A3F9C0    On IsSea:   20A3F9C0 (with members: 4, 3) After ReadHex:    20A3F9C0 
Before ReadHex: 20A4381C (direct array access) On ReadHex:  20A4381C    On IsSea:   20A4381C (with members: 3, 0) After ReadHex:    20A4381C 
[...]

They are all correct. Every single one of them. And even better: The function now evaluates correctly! Here is the changed source (I am omitting the comments this time):

Header:

// These three are actually included from other files.
constexpr unsigned int AREA_none = 0u;
constexpr unsigned int AREA_Lake = 1u;
enum Terrain
{
  OCEAN = 8,
  TERRA_INCOGNITA = 10
};

extern FILE * dump;
class CHex
{
public:
  CHex();
  bool ReadHex();
  bool IsSea() const {
    fprintf(dump, "\tOn IsSea:\t%p (with members: %u, %i) ", (void*)this, this->_Area, this->_nTerrain);
    return this->_Area != AREA_none && this->_Area != AREA_LAKE && this->_nTerrain == Terrain::OCEAN; }
protected:
  unsigned int _Area;
  int _nNavalIndex;
  Terrain _nTerrain;

  static int _nNavalCount = 0;

  // lots more functions and attributes
}

Source:

CHex::CHex() : _Area{0u}, _nNavalIndex{0}, _nTerrain{Terrain::TERRA_INCOGNITA}
{}
bool CHex::ReadHex()
{
  fprintf(dump, "On ReadHex:\t%p ", (void*)this);

  // Calls a lexer/parser pair to procure values from several files.
  // _Area and _nTerrain are being initialised in this process.

  if(this->IsSea()) // Suddenly works!?
    _nNavalIndex = _nNavalCount++;

  // Sanity checks follow here.

  fprintf(dump, "After ReadHex:\t%p ", (void*)this);
  return true;
}

The additional outputs (as well as the initialisation and closing of dump) come from the next higher level in the control flow, another function in another class, where the loop over all hexes resides. I omitted that for now, but will add it if someone thinks it's important.

And apropos that. It now looks to me as if this fault were a result of bugs in the tools, not in the code. As a matter of fact, even though the function now evaluates correctly, the debugger still shows the wrong pointer from before and its nonsensical members.

Community
  • 1
  • 1
Zsar
  • 443
  • 3
  • 14
  • It might be possible to gain further insight: from the calling code, where the list/set of provinces and the iteration is visible; from the assembly, especially since there are no optimizations (specifically: how is the pointer changed). – dyp Oct 01 '15 at 22:42
  • Is there any inheritance in the real code? Sometimes an offset will be applied to a `this` pointer due to inheritance. – Ben Voigt Oct 01 '15 at 22:43
  • 3
    `_Area` is a reserved identifier (reserved for the C++ implementation = compiler + linker + std lib), as are all identifiers beginning with an underscore followed by a capital letter. – dyp Oct 01 '15 at 22:44
  • have you tried running this through valgrind or similar program? – Steve Lorimer Oct 01 '15 at 22:44
  • While I realise you've put effort into asking the question, you still haven't given enough information. In particular, you need to give a small but complete sample of code that exhibits the problem (the `main()` function, etc) so other people have a chance of recreating it. Odds are, some code in your project you haven't shown is overwriting memory it shouldn't. Trying to create a small/complete sample will hopefully eliminate unrelated code, and make it easier to find the problem. – Peter Oct 01 '15 at 22:45
  • Please don't put the details of the problem in code comments. That makes them hard to read because of the SO formatting. – Barmar Oct 01 '15 at 22:45
  • 4
    You forget to return the value from `IsSea` – Andrey Nasonov Oct 01 '15 at 22:45
  • 2
    I wonder how he managed to compile without any warnings with that mistake. – Barmar Oct 01 '15 at 22:47
  • @Barmar: The corresponding warning is [off by default](https://msdn.microsoft.com/en-us/library/k64a6he5.aspx) – Ben Voigt Oct 01 '15 at 22:49
  • @BenVoigt What level is the warning about not having any `return` statement in a function with a return type declared? – Barmar Oct 01 '15 at 22:51
  • @Barmar: [Yes, that should have error'd](https://msdn.microsoft.com/en-us/library/ft5xye74.aspx) – Ben Voigt Oct 01 '15 at 22:55
  • I do not know how it got left out, but the return statement is there in the code. Now also corrected in-post. – Zsar Oct 01 '15 at 23:06
  • @dyp : The provinces are currently all stored in an array of static size. The proper "this"es point to consecutive elements of that array. The changed "this" points to outside the array. Re. underscores: Well, this worked before my latest set of changes, which produced the problem. The code is full of this naming convention. I do not define new variables in that pattern, but I do not think I need to touch the old stuff if it worked before. ... Actually, last I checked e.g. with user-defined literals it is the other way around: leading underscore is mandatory, without is reserved. – Zsar Oct 01 '15 at 23:32
  • @Ben Voigt : Yes, the class inherits from three others. Both the calling and the called member functions are only defined in this leaf though. No namesakes exist in the parent classes. - Should this not mean that IF, then BOTH functions should exhibit wrong pointers? – Zsar Oct 01 '15 at 23:33
  • @Steve Lorimer : I use VS's debugger. I do think that qualifies for a "similar program"? What kind of functionality do you have in mind? – Zsar Oct 01 '15 at 23:34
  • Multiple inheritance (`A : B, C, D` which is different from multi-level inheritance `A : B : C : D`) definitely requires adjustment of the `this` pointer when moving between levels. Doesn't sound like that is happening here, though. – Ben Voigt Oct 01 '15 at 23:34
  • @Peter : I am sorry, I explicitely stated that I cannot create a minimal example, because I do not know how to produce such a fault. If this means that I cannot ask this question, please inform a moderator to have it deleted. – Zsar Oct 01 '15 at 23:35
  • BTW, have you tried `printf("%p", (void*)this)` in all of the functions involved? – Ben Voigt Oct 01 '15 at 23:36
  • @Ben Voigt : It is the first kind, "A : B, C, D". But again, "moving between levels" does not happen where the fault happens - and where it does happen, the fault happens not. – Zsar Oct 01 '15 at 23:37
  • Wll, Zsar, you're hosed if you're not willing to make the effort to create a small but complete example. The only constraint on that is the effort you are willing to expend. People are not mind readers. You have your code. Other people can only guess at what it is doing. – Peter Oct 01 '15 at 23:38
  • @Ben Voigt : The debugger shows those values. I have breakpoints on the very line of the call, inside the one-line function body and on the next instruction. Do you think I should distrust it? ... Also, the site tells me to chat instead of spamming comments. Then it tells me that I am not allowed to chat due to insufficient reputation. I hope this impossible requirement does not cause me trouble. – Zsar Oct 01 '15 at 23:40
  • @Zsar: We understand that [the site fails to check reputation of all parties when issuing that suggestion](http://meta.stackoverflow.com/q/275771/103167), because it saves a lot of database lookups. And we work around it. And I do think you should distrust the debugger, or at least get yourself a list of pointers to main object and all base subobjects, so you can see whether the `this` pointer, as it evolves, matches any of them. – Ben Voigt Oct 01 '15 at 23:45
  • If an object is an instance of more than one class, it can have a different `this` pointer depending on the class the member function is a member of. – David Schwartz Oct 02 '15 at 00:05
  • @Ben Voigt : Oh my god, you are right. Dumping the pointer and the members into a file INSIDE the IsSea function yields the CORRECT values. Furthermore, the function now evaluates CORRECTLY. - Meanwhile the debugger STILL shows the wrong pointer. Adding what I did to the OP in a moment. This looks like a genuine VS bug, does it not? – Zsar Oct 02 '15 at 00:42
  • Did the program actually misbehave, or is the only symptom strange values in the debugger? – M.M Oct 02 '15 at 02:21
  • @Zsar The identifier is reserved because there's an uppercase letter after the leading underscore. `_nNavalIndex` and `_km` are fine, `_Area` and `_MOhm` are reserved. Please see [lex.name]p3 in the C++ Standard. – dyp Oct 02 '15 at 08:02
  • @M.M : Yes, the program did misbehave. The conditional 'if' evaluated as true and the code below it was executed **before** I added the fprintf; after I had added the fprintf, the debugger _still_ showed the wrong values, but **now** the actual evaluation was correct (and the printed values are correct as well), the conditional code was **then** skipped as appropriate. – Zsar Oct 02 '15 at 11:39
  • @dyp : Okay, thank you for that reference. Unfortunately, as if tempting fate, this is exactly how my progenitors named (almost) **all** the member attributes **everywhere**: leading underscore, then a capital letter. Prefixed type information like that 'n' ("natural number") are used sparsely. I am going to make it a point to rename them all in the near future. Still, an earlier rendition of the program performed correctly (at this specific point, at least), so I hope that for this specific problem it is not relevant. - Definitely going to keep it "on the radar" though, so to write. – Zsar Oct 02 '15 at 11:44

2 Answers2

1

EDIT for OP edit:

This now smells even more like a ODR violation. Changing an inline function, and having that change program behavior is exactly what could happen with the undefined behavior induced from ODR violations. Do you use templates anywhere? Also, try de-inlining IsSea() in the original version to see if that helps.

(original answer): This smells like one of three things to me.

First, it could be a one-definition-rule violation for the function in question. Make absolutely certain there aren't multiple versions in different translation units, or different compilation settings in different units.

Secondly the compiler could be doing something because of your use of the reserved name _Area. Regardless of anything else you should fix this problem.

Thirdly, VC++ can utilize different mechanisms for member function pointers, and possibly one of those is affecting your case here (even given that you don't show use of member function pointers). See https://msdn.microsoft.com/en-us/library/83cch5a6.aspx?f=255&MSPPError=-2147217396 for some information. Another possibility is that the compiler options for such pointers are different across translation units.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • Mmh, let's see... The function is not defined for any other class in the same hierarchy. That should take care of point one, right? If that pragma is not present anywhere in the code and the command line options for the compiler do not display any of the possible flags, I should be save from point three, no? Should I try to manually add /vmg? As for point two... I really hope this is not it. Every member I have not touched has a leading underscore, this in a project of 744 files. Of course I could just mass-replace, but... – Zsar Oct 02 '15 at 00:18
  • You might want to take another look at the OP, which I just appended to. @Ben Voigt provided pointers (hè) which led me to discover additional detail and additional symptoms. ... I do not think any of the three points you raise is at work here. – Zsar Oct 02 '15 at 01:02
  • Re ODR violation, could you be more specific as to what to look out for? Because, yes, there is extensive use of templates in other places of the code, but it seems to be _semantically_ isolated from this part of it: CProvince does not inherit from a template. No function in the call stack above the problematic call to IsSea is a template. No parent or other child of a parent of CProvince has an IsSea function. There are only two declarations of IsSea, their signatures and return type are identical, but the classes they are in are disjunct and defined in different translation units. – Zsar Oct 02 '15 at 11:34
  • Inlining is explicitely disabled in the compiler options. I would assume that this prevents ANY inlining from happening? I will try what happens when I move the definition of IsSea into the source file, as you advise, and report back. – Zsar Oct 02 '15 at 11:36
  • Mmh... I tried testing to move the definition of IsSea out of the header and into the source, but... after I restored the status quo by removing the diagnostic lines, the function now _still_ works. Even completely rebuilding the program does not yield the faulty behaviour again. - Well... it works now, even though all is exactly as before when it did not work. I guess that is some kind of a fix. – Zsar Oct 02 '15 at 18:22
1

This is a very sad answer, but alas, sometimes a sad answer is nevertheless correct.

Let us start with the smaller but at least somewhat useful part:

The values VS2015's debugger displays for the this pointer - and in extension all members of the object pointed to - are indeed incorrect and in a very reproducable way:

If a breakpoint is set on a member function defined in a header file, "this" displayed in the debugger will display - the entry point of this function. It will still have the type of the object in question and display all the members... but as those values are populated as offsets from the entry point of said function, their displayed contents are of course nonsensical. All of this is purely a UI issue and does not affect the actual program execution.

And now the useless and depressing part:

The fault which originally prompted me to open this topic, which persisted through several compilations with different build settings - is gone and can no longer be reproduced after I put in the fprinf commands to dump the pointer addresses to a file in order to discover the bug described above. Even though the code is letter by letter identical to the formerly faulty code, it now works flawlessly. Further alterations have done nothing to change this. I cannot bring it back, try as I might.

This whole dastard thing - was a fluke. Somehow. Which means that it may happen again, at any time, for no apparent reason, without any means of prevention whatsoever. Great, is it not?

...

In any case, heartfelt thanks to @Ben Voigt for raising the notion that mayhap those debugger outputs may not be related to reality in the first place. Similarly, thanks to @dyp for pointing out and explaining an unrelated potential issue (names prefixed by '_' followed by a capital letter being reserved expressions) I was not aware of before. Thanks to @Mark-B for actually providing hypotheses about the problem, even if none of them proved to be correct. At least there were some and one might have been.

Zsar
  • 443
  • 3
  • 14
  • 2
    FWIW, I'm seeing this in Visual Studio 2017 (15.4.2) right now. I'm trying to reproduce it in a small app so I can report it to MS. – skst Nov 06 '17 at 20:20
  • I kept seeing this in every single update of VS since I posted the question - now that I understand it's a debugger bug, I can ignore it during diagnosis, so at least it does not hold me back anymore. So far I have no idea how to reproduce it - while code changes fix it, reverting to the original code tends not to reintroduce it. So... good luck. If you can get a MWE to Microsoft, that would help us all. – Zsar Feb 04 '18 at 17:02