0

I ran into a nasty schrödinbug recently. While trying to load a file into a flat memory representation, the author had written code like this:

class Line final { public:
    int stuff[3];
    char* data;
}

//...

Line* line = /*...*/;
//Trying to treat line->data like an array.  This is *wrong*.
line->data = reinterpret_cast<char*>(line) + 3*sizeof(int);

//...

line->data[0] = /*...*/
line->data[1] = /*...*/
//...
line->data[n] = /*...*/ //"line->data" changes because of this line!

So, what's happening is that the first lines of code essentially set line->data equal to &line->data. This is a mistake because any changes to values pointed to by line->data could also change what line->data itself is pointing to!

I found it curious then that it took so long for the problem to occur. My understanding is that, unless qualified with restrict (or for g++/MSVC __restrict) the compiler must assume that pointers are aliased. So if I set line->data[0] to be something, then it will be visible to the next access, line->data[1], and will almost certainly be invalid. In the debugger, however, the change was not visible until much later, and the writes continued happily for a time.

I'm guessing the compiler (in this case MSVC 2013) didn't consider self-aliasing to be possible. Is that allowed?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
geometrian
  • 14,775
  • 10
  • 56
  • 132
  • So is your question basically: "why didn't this crash on `line->data[1] = ...`?"? – Oliver Charlesworth Sep 26 '14 at 20:57
  • Yes. The write through `line->data[0]` should have been visible to the subsequent load of `line->data` when computing `line->data[1]`--unless aliasing doesn't include self-aliasing. – geometrian Sep 26 '14 at 20:59
  • Possibly because of padding, depending on the bitness of the system. – T.C. Sep 26 '14 at 21:00
  • Well, one thing that immediately comes to mind is alignment. Is this running on a 64-bit platform? If so, the calculation in the pointer arithmetic isn't taking padding into account. – Oliver Charlesworth Sep 26 '14 at 21:00
  • @OliverCharlesworth @T.C. This was indeed on a 64-bit platform. Can you clarify how padding applies? `sizeof(Line)==24`, so I'd guess there's a `4` byte pad immediately before `data`--this makes `line->data` slightly _before_ `&line->data`. Interestingly, although the change often happens around `4`, it can sometimes be higher. – geometrian Sep 26 '14 at 21:21

2 Answers2

0

My understanding is that, unless qualified with restrict (or for g++/MSVC __restrict) the compiler must assume that pointers are aliased.

This is incorrect. The compiler is permitted to assume that pointers only alias pointers that point to the same type, or pointers to char.

class X;
class Y;
X *ptr_x = ...;
Y *ptr_y = ...;
char *ptr_char = ...;

Here, the compiler can assume that ptr_x does not alias ptr_y. It cannot make assumptions about ptr_char, however.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Last I checked MSVC doesn't support strict-aliasing, so I'm assuming it must assume all pointers are aliased. In any case the pointer in question here _is_ a `char` pointer. – geometrian Sep 26 '14 at 21:14
  • @IanMallett: Indeed. It is *possible* that this is a compiler bug. However, the undefined behavior would have to be removed from the program first. – Dietrich Epp Sep 26 '14 at 21:58
0

It's hard to know exactly what the problem was. I have long since fixed the problem and am several projects ahead by now. In retrospect, it seems the comments on the original question were most successful in giving clues to explain the behavior:

Possibly because of padding, depending on the bitness of the system.

and:

Well, one thing that immediately comes to mind is alignment. Is this running on a 64-bit platform? If so, the calculation in the pointer arithmetic isn't taking padding into account.

On the 64-bit architecture this was indeed being compiled on, my guess is that the class in the original question would be laid out in memory like so (types adjusted for clarity):

int32_t stuff_0;
int32_t stuff_1;
int32_t stuff_2;
//4 bytes of empty space
char* data;

The padding happens because the char* pointer needs to be 8-byte aligned. Since the first three ints take 3*32/8=96/8=12 bytes, to get that alignment, the compiler needs to insert an extra 4 bytes to bring the overhead to a round 16 bytes.

When data is initialized, it was incorrectly initialized to point to the beginning of the empty space. So, writes to data[n], 0<=n<4 hit the padding. It's only on accessing data[4] that we get a problem.

I say "most successful" since, while the issue did mostly occur around the fifth access, to my memory the problem sometimes occurred later, even while debugging. And, as I wrote, this was a schrödinbug (that is, a bug that should have occurred but didn't--and now that it has been observed, always does). I don't have data on the previous sort of data being run, but it's possible that perhaps the logic caused the critical pointer range to not be affected.

geometrian
  • 14,775
  • 10
  • 56
  • 132