25

A common pattern when providing a C API is to forward declare some opaque types in your public header which are passed to your API methods and then reinterpret_cast them into your defined C++ types once inside the translation unit (and therefore back in C++ land).

Using LLVM as an example:

In Types.h this typedef is declared:

typedef struct LLVMOpaqueContext *LLVMContextRef;

LLVMOpaqueContext is not referenced anywhere else in the project.

In Core.h the following method is declared:

LLVMContextRef LLVMContextCreate(void);

Which is defined in Core.cpp:

LLVMContextRef LLVMContextCreate() {
  return wrap(new LLVMContext());
}

wrap (and unwrap) is defined by a macro in CBindingWrapping.h:

#define DEFINE_SIMPLE_CONVERSION_FUNCTIONS(ty, ref)     \
  inline ty *unwrap(ref P) {                            \
    return reinterpret_cast<ty*>(P);                    \
  }                                                     \
                                                        \
  inline ref wrap(const ty *P) {                        \
    return reinterpret_cast<ref>(const_cast<ty*>(P));   \
}

And used in LLVMContext.h:

DEFINE_SIMPLE_CONVERSION_FUNCTIONS(LLVMContext, LLVMContextRef)

So we see that the C API basically takes a pointer to an LLVMOpaqueContext and casts it into an llvm::LLVMContext object to perform whatever method is called on it.

My question is: isn't this in violation of the strict aliasing rules? If not, why not? And if so, how can this type of abstraction at the public interface boundary be acheived legally?

Drise
  • 4,310
  • 5
  • 41
  • 66
Sam Kellett
  • 1,277
  • 12
  • 33

1 Answers1

22

It's not a strict aliasing violation. To start with, strict aliasing is about accessing an object via a glvalue of the wrong type.

In your question, you create a LLVMContext, and then use a LLVMContext lvalue to access it. No illegal aliasing there.

The only issue which may arise is if the the pointer conversion doesn't yield back the same pointer. But that too is not a problem, since reinterpret_cast is guaranteed to give back the same pointer in a round-trip conversion. So long as the pointer type we convert to and back from is to suitably aligned data (i.e. not stricter than the original type).

Whether or not it's a good or bad way to go about things is debatable. I personally would not bother with LLVMOpaqueContext and return a struct LLVMContext*. It's still an opaque pointer, and it doesn't matter that the C header declares it with struct while the type definition is with class. The two are interchangeable up to the point of the type definition.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • @HolyBlackCat You said "*all pointers (apart from function pointers) are guaranteed to have same size and representation*". Since when? – melpomene Mar 12 '18 at 11:33
  • @melpomene I'm not sure about representation, but more or less sure about the size. I'll look it up. – HolyBlackCat Mar 12 '18 at 11:35
  • Even if they were, the idea that you can form invalid references as long as you don't dereference them, is a myth. – Lightness Races in Orbit Mar 12 '18 at 11:39
  • @LightnessRacesinOrbit - I think the myth is perpetuated by a poor choice of name. "Strict accessing" just doesn't have the same ring to it as "strict aliasing". And the later doesn't imply an access is required to even cause a problem. – StoryTeller - Unslander Monica Mar 12 '18 at 11:41
  • @StoryTeller: Because it isn't – Lightness Races in Orbit Mar 12 '18 at 11:42
  • @LightnessRacesinOrbit - Correct me if I'm wrong, but even this expression statement `*p;` for a pointer `p` is formally an access due to an lvalue-to-rvalue converison. So long as you just lug a pointer around, you aren't doing anything overly sinister. – StoryTeller - Unslander Monica Mar 12 '18 at 11:43
  • @StoryTeller I can't talk about C++, but in C `void *p = malloc(42); free(p); if (p)` definitely is UB. – melpomene Mar 12 '18 at 11:48
  • 3
    @melpomene - Invalid pointer values are a thing in C++ as well. But I think language-lawyer-wise there's a difference with regards to the the cause of the UB between a strict aliasing violation and invalid addresses being used (like you use them in your snippet). – StoryTeller - Unslander Monica Mar 12 '18 at 11:52
  • @melpomene In C++, after deallocation the pointer becomes invalid, which forbids it to be dereferenced but not read. Regardless, the UB coming from invalid pointers isn't strict aliasing, which as mentioned is caused due to accessing an object with a type different from the object – Passer By Mar 12 '18 at 12:14
  • @StoryTeller: Doesn't matter _why_ it's UB – Lightness Races in Orbit Mar 12 '18 at 12:59
  • @StoryTeller: Regarding `struct` vs. `class`: That is true, but Clang, with `-Wall`, will warn when the keywords are used inconsistently. This could be solved with an `#ifdef __cplusplus` I guess, to avoid the need for a `#pragma`. – Arne Vogel Mar 12 '18 at 13:13
  • 2
    @ArneVogel - I imagine Clang picked it up due to its interoperability with MSVC. I know Microsoft have their mangled names affected based of off that. But I don't think this constrained case of returning a pointer is likely to cause a problem, so the warning can be turned off in the TU that defines the class. Thanks for bringing it up however. I didn't actually know Clang behaved this way until now. – StoryTeller - Unslander Monica Mar 12 '18 at 13:18
  • 1
    `*p` doesn't do an l-to-r conversion on the pointee in C++ unless you have a pointer to volatile. – T.C. Mar 12 '18 at 13:57
  • @PasserBy But then you compare the (now invalid) pointer value with `NULL`. Is it not UB? – Joker_vD Mar 12 '18 at 14:33
  • 1
    @PasserBy : No, even *reading* an invalid pointer value is UB. (If you have a segment+offset architecture, and loading an invalid segment descriptor causes a trap, then just testing for NULL can blow up.) – Martin Bonner supports Monica Mar 12 '18 at 17:05
  • @HolyBlackCat : All pointers to `struct` are the same size and representation, but that is not true of other pointers. In particular, `char*` and `void*` have been different sizes to other pointers in historic implementations. – Martin Bonner supports Monica Mar 12 '18 at 17:07
  • @MartinBonner Do you have a standard reference for that? – HolyBlackCat Mar 12 '18 at 17:10
  • 2
    @HolyBlackCat : 3.9.2 p3 in [n4296](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf) "A pointer to cv-qualified (3.9.3) or cv-unqualified void can be used to point to objects of unknown type. **Such a pointer shall be able to hold any object pointer.** An object of type `cv void*` shall have the same representation and alignment requirements as `cv char*`." (my emphasis). Note that this dispensation is specific to char* and void*. – Martin Bonner supports Monica Mar 12 '18 at 17:22
  • 1
    @MartinBonner In C++14 [basic.stc.dynamic.deallocation] *"Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior."* It _was_ UB back in C++11 – Passer By Mar 12 '18 at 17:40
  • @MartinBonner Also, in regards to lvalue-to-rvalue conversion [conv.lval] *"if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined."* – Passer By Mar 12 '18 at 17:52
  • @PasserBy Oo! That's interesting. Presumably an implementation is free to define the behaviour as "may terminate the program without warning". – Martin Bonner supports Monica Mar 12 '18 at 20:32
  • @MartinBonnersupportsMonica: More interesting would be whether an "implementation-defined" action would be allowed to raise a signal before all observable actions preceding the pointer action in question have occurred, or cause the program to terminate without warning at any arbitrary time after the action occurred. So far as I can tell, older standards sought to use the phrase "Undefined Behavior" rather than "Implementation-Defined" behavior in all cases where such loosey-goosey semantics might be appropriate, including those where 99% of implementations should behave identically. – supercat May 25 '21 at 22:22