0

Consider two typedefs

struct A { int member; };
typedef struct A TA;
typedef struct B { int b; } TB;

One can ask libclang for the type underlying the typedef (using clang_getTypedefDeclUnderlyingType(CXCursor)) for both examples. In both cases we get a CXType with kind CXType_Elaborated.

Question: Given those elaborated type nodes, how do I then distinguish a declaration (like the struct A) from a definition (like the struct B { int b; })?

The possibly relevant libclang functions appear to be:

CXType clang_Type_getNamedType(CXType)
CXCursor clang_getTypeDeclaration(CXType)
clang_Type_visitFields(CXType)

But I have not discovered a way to use these to make the distinction between the two kinds of typedefs. The distinction is relevant to be able to pretty print the typedefs again in the same way that it was written in the program.

A.J.Rouvoet
  • 1,203
  • 1
  • 14
  • 29
  • All of these are definitions. A (forward) declaration would be `struct A;`. – Nelfeal Nov 09 '22 at 15:49
  • the `struct A` occurring as a child of the typedef TA is not a definition, or it would be rejected as a redefinition of the top-level `struct A { int member; }`. As explained, the DeclUnderlyingType gives access to this CXType node, but then we still need a way to distinguish that node as a declaration. Visiting the fields of the CXType node in the typedef visits the members of the top-level declaration. – A.J.Rouvoet Nov 09 '22 at 16:09
  • My mistake, I misread the `clang_getTypedefDeclUnderlyingType` part. However, I believe a typedef declaration is different from a type declaration. I would be surprised if there is a way to distinguish between `TA` and `TB` in your example. What happens if you only declare `A` (with `struct A;`)? – Nelfeal Nov 09 '22 at 16:30
  • Also I'm a little confused when you say "the `struct A` occurring as a child of the typedef TA is not a definition". When you do `typedef struct A TA;`, you are not redeclaring `struct A`. You are only declaring `TA`, and `struct A` must have been declared beforehand. – Nelfeal Nov 09 '22 at 16:37
  • I would have agreed with you, except that it seems that most occurrences of `struct ` can (and seem to be) seen as declarations. Consider that `typedef struct A TA` is also fine if the `struct A` is only defined later. So to treat it as a declaration which may or may not be forward pointing is somewhat natural. Similarly `struct A f() {}` is valid when `struct A`is not yet defined but will be later, and even `struct A { int i; } f() {}` is fine, further hinting at the idea that most occurrences occurrences of `struct A` are at least declarations (and maybe even be full definitions). – A.J.Rouvoet Nov 09 '22 at 16:51
  • Regardless of the "right way to think about it" the question remains how to make the distinction between one and the other using libclang. – A.J.Rouvoet Nov 09 '22 at 16:52
  • You're right, `typedef struct A TA;` does declare `struct A`. TIL. And of course a definition is also a declaration. What a definition isn't, is a forward declaration. In any case, I asked what happens if you only forward-declare `A` because looking into `CXType_Elaborated` gives me the impression that it has nothing to do with declarations vs definitions. – Nelfeal Nov 09 '22 at 17:02
  • Nothing much happens, but in that case the problem is not so apparent because it would seem that we could use `clang_getTypeDeclaration` on the CXType node under the typedefs and print that. It would then appear as if we recovered the original program. If we do that on the original example we end up with two definitions of the struct A. – A.J.Rouvoet Nov 09 '22 at 18:04

1 Answers1

0

According to cppreference (yes, it's C++, but I believe it's the same in C), a forward declaration is a special case of "elaborated type specifier", so it isn't surprising that the child node of TA would be a CXType_Elaborated. A forward-declared type that has no definition in a given translation unit is known as an "opaque type".

Looking at CXTypeKind, there does not seem to be a way to distinguish between (forward) redeclarations and definitions within a typedef declaration. The two may very well be the same in the abstract syntax tree.

However, you might be able to achieve what you want by calling clang_isCursorDefinition on the child of the typedef. You can also check out clang_getCursorDefinition to find where the definition is.

Nelfeal
  • 12,593
  • 1
  • 20
  • 39
  • Unfortunately, clang_isCursorDefinition operates on CXCursor and not on CXType. It is doable to get out the corresponding cursor in this particular case, but in general the same ambiguity can occur in many places, e.g. nested in CXTypes returned from other libclang functions, in which case there is no obvious corresponding CXCursor., – A.J.Rouvoet Nov 09 '22 at 18:00
  • @A.J.Rouvoet I'm no expert on libclang, but it seems to me like you are supposed to traverse all `CXCursor` nodes if you want to do something as detailed as printing back the original code. – Nelfeal Nov 09 '22 at 18:13
  • The same thought struck me as well. The problem there is that finding the representative cursors is not so easy with the API either. Consider a function declaration. The api provides `CXType clang_getResultType(CXCursor)`, but no corresponding function returning a `CXCursor`. One can get the children, but selecting the right child is oftentimes tricky, as the presence of certain children varies depending on the shape of the AST. – A.J.Rouvoet Nov 09 '22 at 19:09
  • I tried navigating the cursors instead today, but unfortunately not all type information is accessible that way. For example, `typedef int myint` has no children, whereas `typedef struct A *myPointer` has the `struct A` TypeRef cursor as a *direct* child. – A.J.Rouvoet Nov 10 '22 at 20:44
  • @A.J.Rouvoet I guess that's because it's sufficient information for an AST. Again I'm no expert, but it's very possible that multiple different sequences of tokens (different source codes) can result in the same AST. That would obviously make it impossible to get back the original code from its AST. – Nelfeal Nov 10 '22 at 22:15
  • Sorry, that was unclear: what I meant was `typedef struct A *myPointer` has `struct A` as the *only child*. That is, the traversal entirely skips the structure of the pointer. My feeling is that libclang is just unsuitable for my requirements and I should look into libtooling. Yet, it is still strangely inconsistent what information can be accessed through libclang. – A.J.Rouvoet Nov 11 '22 at 11:04
  • @A.J.Rouvoet Well, `myPointer` is a pointer, but that information is given by the type of the cursor, right? Then type it points to would be the only child; it makes sense to me. If you what you find strange is that this child has no children (what a `struct` should have), it's probably because you would have already traversed the struct members before, at the position given by `clang_getTypeDeclaration`. Anyway, I'm afraid I'm out of my depth here. My guess is that you want a lexer, which keeps almost everything (kind of like lossless compression for code), not a parser. – Nelfeal Nov 11 '22 at 14:52