4

So I am developing a programming language which compiles to bytecode for VM execution and also to C as an intermediate language for compiling to native binary. I chose C because it is low level enough and portable, saving a tremendous amount of effort by reusing existing compilers and not having to write compilers to assembly for each and every different platform and its oddities.

But existing compilers come with their drawbacks, one of which is the circular dependency issue. I want to solve circular dependencies in an elegant way (unlike C/C++) without awkward forward declarations, without having to use pointers and extra indirection and wasted memory, without having to separate declarations from definitions and so on... In other words, take this issue away from the developer like some programming languages do.

The way I see it, current C/C++ compilers' main problem with this is they cannot "look into the future" even though it is not really future, since the programmer intent is already expressed in code, my compiler does not have that issue, it is not unaware of anything beyond some certain point of parsing progress, it knows the sizes of objects with circular dependencies and can calculate the appropriate offsets and such.

I've already implemented "faked" inheritance which simply does "inline expansion" of inherited structures' members, so I am thinking I can also use the same approach to actually fake aggregation as well. In the most basic and simple example:

typedef struct {
    int a;
} A;

typedef struct {
    A a;
    int b;
} B;

becomes:

typedef struct {
    int A_a;
    int b;
} B;

and the compiler does a bit of "translation":

B b;
b.a.a = 7;

becomes:

b.A_a = 7;

And in the same fashion all structures are collapsed down to a single root structure which only contains primitive types. This way there are absolutely no types used in structures whose sizes are not known in advance so the order of definition becomes irrelevant. Naturally, this mess is hidden away from the user and is only for the compiler's "eyes to see" while the user side is being kept structured and readable. And it goes without saying, but the binary footprint is preserved for compatibility with regular C/C++ code, the collapsed structure is binary identical to a regular structure that would use aggregation or inheritance.

So my question is: Does this sound like a sound idea? Anything that could go wrong I am missing?

EDIT: I only aim to solve the C/C++ related difficulties with circular dependencies, not the "chicken or egg" logical paradox. Obviosly it is impossible for two objects to contain each-other without leading to some form of infinite loop.

  • 1
    Impressive, but is there any real "point" to the inline expansion? Is the generated code any different when accessing `b.A_a` rather than `b.a.a`? I would expect it to be exactly the same, so you're doing a lot of work to "optimize" something without much benefit. Just asking. – unwind Nov 04 '13 at 10:16
  • 1
    @unwind - the benefit is circular dependency becomes an extinct and irrelevant issue. –  Nov 04 '13 at 10:20
  • 1
    Ah, I see. It would perhaps have been clearer if your C-like example had had `A` and `B` in the reverse order. As shown, the C declarations are fine, and there's no circularity problem. – unwind Nov 04 '13 at 10:22
  • @unwind - the code demonstrates the concept of solving the problem, not the problem itself, which is only explained in text. I assumed people would read the body of the question and not just the source code ;) –  Nov 04 '13 at 10:31
  • Please excuse me if I'm mistaken, but I don't you've "solved" circular dependencies, i.e. `A` being defined in terms of `B` while `B` is defined in terms of `A` at the same time. I guess you are trying to figure out the correct declaration order for the C compiler instead. – Jonas Bötel Nov 04 '13 at 10:55
  • @LumpN - yes you are correct, the issue of aggregating objects with circular dependency remains, since it is a logical paradox (e.g. chicken and egg), but it makes it much easy for for objects to access each other's functionality by making the order of declaration and definition irrelevant through collapsing every structure down to primitive types. I only aim to solve circular dependencies in the manner which JAVA solves it, and even in JAVA you still get a stack overflow if two objects with circular dependency end up calling each other's constructors infinitely. –  Nov 04 '13 at 11:12
  • @user2341104 I do get the idea and your approach. You are talking about dependencies of types that might not be declared in the required order for C. Still no circularities here, just ordering mismatch/lookahead. My question is why doesn't your "compiler" keep all the structs as is and just emit C code for them in the required order instead? – Jonas Bötel Nov 04 '13 at 11:19
  • 1
    But if you intend to use the `A_*` members of `B` as if it were an `A` (ie. casting it to `A*`) you will have endless problems with alignment, padding and the like. – rodrigo Nov 04 '13 at 11:21
  • @user2341104 The other thing being circular referencing header files. Those can easily be solved using `#ifdef`s like in C++. – Jonas Bötel Nov 04 '13 at 11:22
  • @LumpN - it would be easier to collapse all structures to primitive types, while tracking the order will require an extra implementation. But I still haven't made up my mind, that is why I am asking the question. I might end up tracking the dependency order as well. All in all I'd prefer "inline implementations" like in JAVA instead of separating into headers and sources for the sake of improving productivity. –  Nov 04 '13 at 11:26
  • @user2341104 you need to figure out the correct dependency order for inlining too. It's probably already there, hidden in some recursive inlining function of yours. – Jonas Bötel Nov 04 '13 at 11:34
  • @user2341104: But you don't have to separate into header/source. You can emit only source files, and copy the struct definition in every file you need it. As long as you comply with the One Definition Rule, you'll be fine. – rodrigo Nov 04 '13 at 11:36

1 Answers1

1

You cannot safely use pointers to the substructures because you cannot get pointers to "compatible types" by pointing to the primitive members. E.g. after

struct Foo {
    short a;
    int b;
};

struct Bar {
    struct Foo foo;
};

struct Bar bar;

the pointers &bar.foo and &bar.foo.a have different types and cannot be used interchangeably. They also cannot be cast to each other's types without violating the strict aliasing rule, triggering undefined behavior.

The problem can be avoided by inlining the entire struct definition each time:

struct Bar {
    struct { short a; int b; } foo;
};

Now &bar.a is a pointer to struct {short; int;} which is a compatible type for struct Foo.

(There may also be padding/alignment differences between struct-typed members and primitive members, but I couldn't find an example of these.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • I preserve binary compatibility by manually inserting padding bytes, well not manually, but my compiler does it to conform to what the C compiler would do when aggregating structures. –  Nov 04 '13 at 14:48