c compiler typecheck algorithm and semantic analysis

Question

Is there any suggested algorithm for finding type of variable in c code?

Im programming a compiler of a small subset of c language. It now handles int and float types but it should handle any legal c type (among functions, int and float) possible. (eg int ** (*fp)(int, int) etc..) Since there is arbitrary amount of possibilities its not possible to use any kind of enum or hash table.

So how is this problem usually solved?

Can this kind of declaration be done with LL1 parser?

This is a very broad subject indeed! I suggest you look at open source C compilers, eg `pcc` and `tcc`. For a complete type representation, an arborescent structure may be required. — chqrlie, Jan 15 '16 at 00:41
I dont think that the standard gives me any help. I just wondered what is the most easy way to do those checks and is there any commonly used or suggested approach. @FUZxxl — happytomato, Jan 15 '16 at 00:51
The spec will specify what you can and cannot do [legality wise]. But, pull the source to some compilers (e.g. gcc, clang) and look at them, with particular reference to their grammar files. Also, "tiny c" might be easier to look at. IIRC, C can be parsed LALR(1), but I think gcc switched to "top down" a few revs back — Craig Estey, Jan 15 '16 at 01:07
This is too broad to actually answer, but here's a hint: types can be complex data structures, you will need to use tree-like structures and not just simple `enum`s or similar. — user253751, Jan 15 '16 at 01:07

Gene · Answer 1 · 2016-01-15T05:22:22.657

This is a question far too deep to answer completely here. However, most compilers use a graph data structure to represent types. (Many years ago, the graphs were encoded in elaborate ways to save space, but these days that's not necessary.) The graph nodes for C are a recursive type (as most graph nodes are) roughly like this:

typedef enum { 
  VOID, INT, CHAR, DOUBLE, ENUM, POINTER, ARRAY, STRUCT, UNION, FUNCTION,
} KIND;

typedef struct type_s {
  KIND kind;
  union {
    struct enumeration_s {
      int n_values;
      struct enum_value_s *values;
    } enumeration;
    struct pointer_s {
      struct type_s *to_type;
    } pointer;
    struct array_s {
      struct type_s *of_type;
      size_t n_elements;
    } array;
    struct struct_or_union_s {
      size_t n_fields;
      struct field_s *fields;  // Variable-sized array of fields.
    } struct_or_union;
    struct function {
      struct type_s *return_type;
      size_t n_args;
      struct field_s *args;    // Variable-sized array of args.
    } function;
  } u;
} TYPE;

typedef struct enum_value_s {
  char *name;
  int value;
} ENUM_VALUE;

typedef struct field_s {
  char *name;
  struct type_s *type;
} FIELD;

If you have already built a compiler, then you ought to know what an abstract syntax tree is. This is just an AST for types. You should be able to easily draw a graph (it's a graph because you want the nodes for the leaf types INT,... to be singletons) of the type for int ** (*fp)(int, int).

And yes (with the exception of the well known typedef ambiguity that you might already be handling) it's not hard to generate these type graphs in an LL(1) or LR(1) parser.

c compiler typecheck algorithm and semantic analysis

1 Answers1