0

Potentially related 30970251, 7687082.

I am considering writing a memory allocator and trying to work out how to navigate the restrictions modern C puts on type punning and aliasing. I think I'm in the clear as long as the buffer underlying the allocator was originally retrieved from malloc as pointers from malloc have no declared type.

An over aligned char buffer does have a declared type. I don't think I can cast a pointer into it to an arbitrary type and must instead carefully writing to it via a char pointer, e.g. using memcpy. This is painful because I can't see a way to hide the write via memcpy hack from the caller.

Consider the following:

#include <assert.h>
#include <stdalign.h>
#include <stdint.h>
#include <string.h>

static_assert(sizeof(double) == sizeof(uint64_t), "");
static_assert(alignof(double) == alignof(uint64_t), "");

int main(void)
{
  alignas(alignof(double)) char buffer[sizeof(double)];
  // effective type of buffer is char [8]                                                                                                                                                                                                 

  {
    double x = 3.14;
    memcpy(&buffer, &x, sizeof(x));
    // effective type of buffer is now double                                                                                                                                                                                             
  }

  {
    uint64_t* ptr = (uint64_t*)&buffer;
    // effective type of buffer is still double                                                                                                                                                                                           
    // reading from *ptr would be undefined behaviour                                                                                                                                                                                     
    uint64_t y = 42;
    memcpy(ptr, &y, sizeof(y));
    // effective type of buffer is now uint64_t                                                                                                                                                                                           
  }

  {
    double* ptr = (double*)&buffer;
    // effective type of buffer is still uint64_t                                                                                                                                                                                         
    uint64_t retrieve = *(uint64_t*)ptr;  // OK                                                                                                                                                                                           
    assert(retrieve == 42);

    double one = 1.0;
    *ptr = one;  // Unsure if OK to dereference pointer of wrong type                                                                                                                                                                     
    // What is the effective type of buffer now?                                                                                                                                                                                          
    assert(*ptr == one);
  }
}

This is workable in that I can diligently ensure that every time a custom allocator returns a void pointer it is written to with memcpy, instead of cast to the desired type. That is, replace

double * x = my_malloc(sizeof(double));
*x = 3.14;

with:

double tmp = 3.14;
void * y = my_malloc(sizeof(double));
memcpy(y, &tmp, sizeof(double));
double * x = (double*)y;

All this line noise gets killed off by the optimisation passes in the compiler, but does look silly. Is it necessary to be standards compliant?

This can definitely be solved by writing the allocator in asm instead of in C but I'm not especially keen to do so. Please let me know if the question is underspecified.

Community
  • 1
  • 1
Jon Chesterfield
  • 2,251
  • 1
  • 20
  • 30

1 Answers1

1

No, not generally. It only changes the effective type of objects that don't have a type when they are allocated, that is that are allocated through malloc and friends.

So if you do such stuff as the user of a compiler and library implementation the behavior of your program is undefined. An array that is allocated as char[] always has the effective type of that.

If you are a compiler or library writer, you are not bound to these restrictions. You just have to convince your tool chain not to optimize things too much. Typically you could do that by ensuring that your allocator function lives in a TU of its own that only exports a void*, and make sure that you don't have link time optimization or stuff like that switched on.

If you provide such a function as part of the C library (replacement) it is then you as the implementor that must give the guarantees to your users.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • Thanks. That's essentially what I feared. Potentially bad news for statically linking a libc with LTO enabled. Definitely bad news for application level allocators. I'm increasingly sure that simplifying TBAA isn't worth the price of turning clear developer intent into UB. – Jon Chesterfield Dec 23 '16 at 00:22
  • @JonChesterfield: I strongly suspect that the authors of C89 intended that compiler writers interpret it as saying that compilers need not pessimistically assume that aliasing may occur *in cases where nothing in the code would suggest it*, but not that quality compilers should ignore aliasing in cases where it's obvious, especially on platforms where it could be useful. Given `uint32_t float_as_bits(float *fp) { return *(uint32_t*)fp; }`, would it really be "pessimistic" for a compiler to presume that the function might access something of type `float`? – supercat Jan 09 '17 at 22:57
  • @supercat Perhaps. It's certainly a nuisance. Unfortunately this is the way C++ is going and C seems to be following. On the bright side, generating machine code directly is still an option. – Jon Chesterfield Jan 09 '17 at 23:19
  • @JonChesterfield: Back in the early 1990s, I thought it was silly that C code should need to use memcpy to copy byte-counted arrays rather than a proper array-copy function. I never would have guessed that future C compilers would evolve in the direction of requiring `memcpy` more often. I also find bizarre the attitude of compiler writers that programmers should write in assembly if they want control, when *one of the main purposes for which C was invented was to allow low-level programming without assembly language*. – supercat Jan 09 '17 at 23:22