4

Does C have anything similar to C++ where one can place structs in an unsigned char buffer as is done in C++ as shown in the standard sec. 6.7.2

template<typename ...T>
struct AlignedUnion {
  alignas(T...) unsigned char data[max(sizeof(T)...)];
};
int f() {
  AlignedUnion<int, char> au;
  int *p = new (au.data) int;           // OK, au.data provides storage
  char *c = new (au.data) char();       // OK, ends lifetime of *p
  char *d = new (au.data + 1) char();
  return *c + *d;                       // OK
}

In C I can certainly memcpy a struct of things(or int as shown above) into an unsigned char buffer, but then using a pointer to this struct one runs into strict aliasing violations; the buffer has different declared type.

So suppose one would want to replicate the second line in f the C++ above in C. One would do something like this

#include<string.h>
#include<stdio.h>
struct Buffer {
  unsigned char data[sizeof(int)];
};

int main()
{ 
  struct Buffer b;
  int n = 5;
  int* p = memcpy(&b.data,&n,sizeof(int));
  printf("%d",*p);  // aliasing violation here as unsigned char is accessed as int
  return 0;
}

Unions are often suggested i.e. union Buffer {int i;unsigned char b[sizeof(int)]}; but this is not quite as nice if the aim of the buffer is to act as storage (i.e. placing different sized types in there, by advancing a pointer into the buffer to the free part + potenially some more for proper alignment).

  • The typical handling in c is to use macro or functions for correct initialization. – πάντα ῥεῖ Jan 15 '22 at 17:00
  • The example maybe doesn't really show what you intend it to show. The `int` object is never accessed and accessing the `unsigned char` as a `char` or vice versa wouldn't be a problem in either C or C++. The latter `char` are different objects then at the beginning, but I am not sure that notion is relevant in C. The second example in the linked paragraph seems more relevant. – user17732522 Jan 15 '22 at 17:03
  • I'm just going to assume that in c++ it would be ok to actually access the pointer p in `int *p = new (au.data) int;`, is that an incorrect assumption? – anonymouscoward Jan 15 '22 at 17:11
  • Why don't you declare `struct Buffer` instead as a `union`? – Neil Jan 15 '22 at 17:11
  • @anonymouscoward Yes, but the example doesn't demonstrate that. – user17732522 Jan 15 '22 at 17:12
  • @Neil Although OP doesn't mention it in the question, one difference is that you can dynamically decide the location of the new object in the buffer, which I suppose is possible only in compile-time fixed constellations with unions. I am not sure whether OP intends to ask only about the straight-forward case that can be easily solved with a union, or more generally. – user17732522 Jan 15 '22 at 17:14
  • @Neil: That is something that does work on many (most?) platforms, but is not strictly compliant. (By the standard, you may only read that type from a union that was last written into it; reading some other type is not well-defined.) A type may be memcpy'd to `unsigned char[]` and back. Nothing else is guaranteed. – DevSolar Jan 15 '22 at 17:16
  • @Neil unions are alright but becomes pretty hairy once you want to place several things after each other maybe in varying order on various runs. – anonymouscoward Jan 15 '22 at 17:17
  • I mean, simpler then that, `union buffer { unsigned char c[sizeof(int)]; int i; } . . . int *p = &b.i`; but it would not have an automatic switch if you wanted to store multiple types. – Neil Jan 15 '22 at 17:23
  • 1
    @Neil I think a better example for "providing storage" not so easily captured with a union: Suppose the `unsigned char` buffer is supposed to be used as a backing for a custom allocator that may be used to allocate memory for objects of different types at the same location during its lifetime. – user17732522 Jan 15 '22 at 17:30
  • @user17732522 Maybe a stack? To avoid the change-making problem, always define a lowest possible (byte.) `struct { size_t bytes; union { unsigned char byte[sizeof(int)]; int d; } }`, `b.bytes += b.bytes % sizeof(int)`. – Neil Jan 15 '22 at 17:47
  • 1
    Why do you think `printf("%d", *p);` is an aliasing violation? `p` is an `int` pointer -- that's fine. Access through a character type is a specific exception to the strict aliasing rule as is the use of `memcpy` setting the *effective-type* for subsequent access to the type copied *from* (for subsequent accesses that do not modify the object). See [6.5 Expressions (p 6-7)](http://port70.net/~nsz/c/c11/n1570.html#6.5p6) (worst written two paragraphs of the standard ....) – David C. Rankin Jan 15 '22 at 17:48
  • 1
    @Neil What if I want to have it hold a number of packed `int`s at first, then later a number of packed `double`s and then at another point in time some combination of pointers to some `struct A` and `struct B`, basically as a user implementation of `malloc` would need to. OP should have given such an example that makes clearer use of the concept, but I think the question boils down to whether or not it is possible to change the effective time of some storage in C dynamically. – user17732522 Jan 15 '22 at 17:55
  • @DavidC.Rankin I think whether it is possible to change the effective type is basically what OP's question boils down to. I think the question is whether the quoted section of the standard really allows treating the subarray of the buffer as the target type after the memcpy. (i.e. what does "for subsequent accesses that do not modify the object" imply?) – user17732522 Jan 15 '22 at 17:59
  • @user17732522 Far beit for me to claim to be an expert on that poorly worded section. We have had a number of lively discussions on this site about just what the hell the writers intended - without a firm consensus. But I think for the `memcpy`, `memmove` part it means just what it says. If your *from* type is a type (as opposed to `void*`), then the effective type for subsequent access becomes the type that was copied *from*. In the case of `void*`, then the destination type is used (that's all you have). Whether the pointer points to the start or to an offset within a `char` type - it's ok. – David C. Rankin Jan 15 '22 at 21:00
  • It's ub if taking the standard literally though it will work everywhere. See https://stackoverflow.com/a/44507851/4989451 – tstanisl Jan 15 '22 at 21:23

3 Answers3

1

Have you tried using a union?

#include <string.h>
#include <stdio.h>

union Buffer {
    int int_;
    double double_;
    long double long_double_;
    unsigned char data[1];
};

int main() {
    union Buffer b;
    int n = 5;
    int *p = memcpy(&b.data, &n, sizeof(int));
    printf("%d", *p);  // aliasing violation here as unsigned char is accessed as int
    return 0;
}

The Buffer aligns data member according the type with the greatest alignment requirement.

Bo R
  • 2,334
  • 1
  • 9
  • 17
  • This will work, but how would you solve the problem of using an unsigned char buffer as storage for an allocator without triggering strict aliasing issues? I can't think of any way of storing a variety of e.g. structs in an unsigned char, besides for writing to them using memcpy and maybe ofsetoff if I want to write to just one data member. Which can become a pita fast – anonymouscoward Jan 15 '22 at 17:48
  • A call like `union Buffer * ub = malloc(13);` will give the correct alignment since `malloc` gurantees *the allocated memory, which is suitably aligned for any built-in type.* (if I understand your question correctly). – Bo R Jan 15 '22 at 17:54
  • Wouldn't it be simpler `int *p = &b.int_; *p = n;`? – Neil Jan 15 '22 at 18:00
  • 1
    @Bo R The question here is how to use storage that is fixed at compile time, such as unsigned char buffers. Alignment isn't a problem as one can get the same guarantees as one gets from malloc by just doing something like alignas(max_align_t) buffer unsigned char[size]; (one must however calculate the proper offset into this for each thing to be placed in sequence, so everything placed into it is aligned). My question basically is how would you use unsigned char buffer as allocator storage while avoiding strict aliasing issues (as seem to be possible in c++ with placement new). – anonymouscoward Jan 15 '22 at 18:02
  • Possilby in this case, but `n` might be from an unalinged memory addres you recenved and want to make sure it's aligned before dereferencing that memory location. – Bo R Jan 15 '22 at 18:03
  • @anonymouscoward It would be easier to answer with an expanced example. But it would likely invole a function calculating the nearest correct alignment any given address. The `_Alignof( type-name )` (from C11) would help in this function implementation. – Bo R Jan 15 '22 at 18:24
  • @BoR OP's concern is not alignment, but (strict) aliasing: https://port70.net/~nsz/c/c11/n1570.html#6.5p7 – user17732522 Jan 15 '22 at 18:43
  • @user17732522 that explains my confusion. – Bo R Jan 15 '22 at 18:45
1

Yes, because of strict aliasing rule it is just not possible. As it is not possible to write a standard compliant malloc().

Your buffer is not aligned - alignas(int) from stdalign.h needs to be added.

If you want to protect against compiler optimizations, either:

  • just cast the pointer and access it and compile with -fno-strict-aliasing, or use volatile
  • or move the accessor to the buffer to another file that is compiled without LTO so that compiler just is not able to optimize it.

// mybuffer.c
#include <stdalign.h>
alignas(int) unsigned char buffer[sizeof(int)];
void *getbuffer() { return buffer; }


// main.c
#include <string.h>
#include <stdio.h>
#include "mybuffer.h"
int main() {
  void *data = getbuffer();
  // int *p = new (au.data) int;           // OK, au.data provides storage
  int *p = data;
  // char *c = new (au.data) char();       // OK, ends lifetime of *p
  char *c = data;
  *c = 0;
  // char *d = new (au.data + 1) char();
  char *d = (char*)data + 1;
  *d = 0;

  return *c + *d;
}
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

The way the definition of Effective Type in 6.5p6 is written, it's unclear what it's supposed to mean in all corner cases--likely because there was never a consensus among Committee Members as to how all corner cases should be handled. Defect reports often add more confusion than clarity, since they use terms like the "active member" of a union when neither the Standard nor the defect reports specify what actions would set or change it.

If one wants to use an object of static or automatic duration as though it were a buffer without a declared type, a safe way of doing that should be to do something like the following:

void volatile *volatile dummy_vp;

void test(void)
{
  union {
    char dat[1000];
    unsigned long force_alignment;
  } buffer;
  void *volatile launder = buffer.dat;
  dummy_vp = &launder;
  void *storage_blob = launder;
  ...
}

Unless an implementation goes out of its way to test whether the read of launder happened to yield an address matching buffer.dat, it would have no way of knowing whether the object at that address had a declared type. Nothing in the Standard would forbid an implementation from behaving nonsensically if the address happened to match that of buffer.dat, but situations where performance improvements would justify the cost of the check aren't likely to be common enough for compilers to attempt such "optimization".

supercat
  • 77,689
  • 9
  • 166
  • 211