At the very first, apologies in advance for the very indistinct presentation of the question - if I knew what to ask for, I would probably know how to fix it. ... But I do not even have a faint theory, seven years of C++ experience notwithstanding. Any helpful pointers (hè) will be most appreciated.
- Possibly related to this question. (same symptoms)
- Not related to that one. (no explicit function pointers here)
Problem: A count of objects with certain properties, which themselves are checked by a member function of their class shows incorrect results. The source is that the check happens with gibberish values. The source of that is that the pointer "this" changes between calling the function and entering its body. Once the body is left, the pointer is again correct.
Sadly, the solution for the related question does not work here.
I am not able to produce a minimal example of the problem. Furthermore, literally hundreds of member functions are being called correctly in the same program, as far as I know without exhibiting this behaviour.
- What can I do? As a bandaid, replacing the function call with a copy of its body works, but this is no proper solution.
I am at a complete loss as to how to proceed with my diagnosis.
Concise: What steps can I follow to attain greater insight into the nature of the problem?
A short checklist of things already taken care of:
- The objects in question are properly initialised at the time of the call.
- All optimisations are off. No inlining. This is a debug build with the appropriate settings in effect.
- Cleaning and rebuilding the project has not yielded a different result.
- Recompiling with the original (but retyped) function call after the bandaid solution had been tested successfully led to a return of the problem.
- There are no compiler warnings in the compilation unit involved (warning level 3), specifically disabled project-wide are:
- C4005 (macro redefinition, due to using a custom/hacked Windows SDK for compatibility reasons - this was originally a Win95 program)
- C4244 (implicit cast to smaller type, due to legacy code waiting to be refactored - those are all float-to-int conversions that lack an explicit cast, all 800+ instances of them)
- C4995 (calling function marked with #pragma deprecated, due to C-lib functions being called without preceding underscore - hoping to eventually switch to GCC)
- "control flow guard" and "basic runtime checks" are enabled, but do not trigger.
And a hint that may or may not be relevant, but which I cannot interpret at the moment:
- For the very first hex, IsSea is called normally, that is: "this" inside is identical to "this" outside
- Only in all hexes that follow does the bug happen.
- The altered "this" pointer does not point to the first hex though, it seems to hit unallocated space.
Here is an extract of how it looks like:
Header:
// These three are actually included from other files.
constexpr unsigned int AREA_none = 0u;
constexpr unsigned int AREA_Lake = 1u;
enum Terrain
{
OCEAN = 8,
TERRA_INCOGNITA = 10
};
class CHex
{
public:
CHex(); // initialises ALL members with defined error values
bool ReadHex(); // Init-type function calling the problematic one
bool IsSea() const // problematic function
{ return this->_Area != AREA_none && this->_Area != AREA_LAKE && this->_nTerrain == Terrain::OCEAN; }
// The body does the right thing - just WITH the wrong thing.
protected:
unsigned int _Area;
int _nNavalIndex;
Terrain _nTerrain;
static int _nNavalCount = 0;
// There are a lot more functions in here, both public and protected.
// The class also inherits a bunch from three other classes, but no virtual functions and no overlaps are involved.
}
Source:
CHex::CHex() : _Area{0u}, _nNavalIndex{0}, _nTerrain{Terrain::TERRA_INCOGNITA}
{}
bool CHex::ReadHex()
{
// Calls a lexer/parser pair to procure values from several files.
// _Area and _nTerrain are being initialised in this process.
// All functions called here work as expected and produce data matching the source files.
// _Area and _nTerrain have the correct values seen in the source files at this point.
if(this->IsSea()) // but inside that it looks as if they were uninitialised
// This ALWAYS happens because the function always returns true.
_nNavalIndex = _nNavalCount++;
// Stopping at the next instruction, all values are again correct
// - with the notable exception of the two modified by the instruction that should not have happened.
// If I replace that with the following, I receive the correct result:
/*
// direct copy of the function's body
if(this->_Area != AREA_none && this->_Area != AREA_Lake && this->_nTerrain == Terrain::OCEAN)
_nNavalIndex = _nNavalCount++; // only happens when it should; at the end, the count is correct
*/
// Sanity checks follow here.
// They too work correctly and produce results appropriate for the data.
return true; // earlier returns exist in the commented-out parts
}
Sorry again for this big mess, but well, right now I am a mess. It's like seeing fundamental laws of physics change.
--
On advice from @Ben Voigt I hacked in a diagnostic that dumps the pointers into a file. Behold:
Before ReadHex: 20A30050 (direct array access) On ReadHex: 20A30050 On IsSea: 20A30050 (with members: 0, 8) After ReadHex: 20A30050
Before ReadHex: 20A33EAC (direct array access) On ReadHex: 20A33EAC On IsSea: 20A33EAC (with members: 2, 0) After ReadHex: 20A33EAC
Before ReadHex: 20A37D08 (direct array access) On ReadHex: 20A37D08 On IsSea: 20A37D08 (with members: 2, 0) After ReadHex: 20A37D08
Before ReadHex: 20A3BB64 (direct array access) On ReadHex: 20A3BB64 On IsSea: 20A3BB64 (with members: 3, 0) After ReadHex: 20A3BB64
Before ReadHex: 20A3F9C0 (direct array access) On ReadHex: 20A3F9C0 On IsSea: 20A3F9C0 (with members: 4, 3) After ReadHex: 20A3F9C0
Before ReadHex: 20A4381C (direct array access) On ReadHex: 20A4381C On IsSea: 20A4381C (with members: 3, 0) After ReadHex: 20A4381C
[...]
They are all correct. Every single one of them. And even better: The function now evaluates correctly! Here is the changed source (I am omitting the comments this time):
Header:
// These three are actually included from other files.
constexpr unsigned int AREA_none = 0u;
constexpr unsigned int AREA_Lake = 1u;
enum Terrain
{
OCEAN = 8,
TERRA_INCOGNITA = 10
};
extern FILE * dump;
class CHex
{
public:
CHex();
bool ReadHex();
bool IsSea() const {
fprintf(dump, "\tOn IsSea:\t%p (with members: %u, %i) ", (void*)this, this->_Area, this->_nTerrain);
return this->_Area != AREA_none && this->_Area != AREA_LAKE && this->_nTerrain == Terrain::OCEAN; }
protected:
unsigned int _Area;
int _nNavalIndex;
Terrain _nTerrain;
static int _nNavalCount = 0;
// lots more functions and attributes
}
Source:
CHex::CHex() : _Area{0u}, _nNavalIndex{0}, _nTerrain{Terrain::TERRA_INCOGNITA}
{}
bool CHex::ReadHex()
{
fprintf(dump, "On ReadHex:\t%p ", (void*)this);
// Calls a lexer/parser pair to procure values from several files.
// _Area and _nTerrain are being initialised in this process.
if(this->IsSea()) // Suddenly works!?
_nNavalIndex = _nNavalCount++;
// Sanity checks follow here.
fprintf(dump, "After ReadHex:\t%p ", (void*)this);
return true;
}
The additional outputs (as well as the initialisation and closing of dump) come from the next higher level in the control flow, another function in another class, where the loop over all hexes resides. I omitted that for now, but will add it if someone thinks it's important.
And apropos that. It now looks to me as if this fault were a result of bugs in the tools, not in the code. As a matter of fact, even though the function now evaluates correctly, the debugger still shows the wrong pointer from before and its nonsensical members.