2

As far as I know after the parser builds the AST this structure is transformed into a 'intermediate level' IR, for example three adress code or any other. Then, in order to perfom some analysis this IR is transformed into a control flow graph. My question is, if it is possible going from the AST representation to the CFGs without going through another IR, and then perform, for example, data flow analysis through the CFGs with successfull results?

helloStack
  • 45
  • 7

1 Answers1

3

You can't construct a CFG without first doing scope and then name resolution.

You need scope resolution to determine the "scope" of implicit control transfers, e.g., the boundaries of an if-then-else, a try-statement, blocks with "break statements", return statements, etc. This is generally fairly easy because scopes tend to follow the syntax structure of the language, which you already have with an AST.

You also need scope resolution to determine where identifiers are defined, if your language allows control transfers to named entities ("goto "), "call ". You can't know which a goto targets without knowing which scope contains the goto, and how scope control the lookup of the named label.

With scope resolution you can implement name resolution; this allows you to assign names to the defining scope containing them, and to attach the definition of the name (e.g., knowing that "goto x" refers to the x in specific scope, and thus to line 750 where x is defined in that scope.

Once you have name resolution (so you can look up the definition of x in "goto x"), now you can construct a control flow graph.

You can do all three of these using attribute grammars, which are essentially computations directly over the AST. So, no, you don't need anything other than the AST to implement these. {You can learn more about attribute computations at my SO answer describing them. Of course, anything you can do with a formal attribute computation you can also do by simply writing lots of recursive procedures that walk over the tree and compute the equivalent result; that's what attribute grammars are compiled to in practice.

Here you can find some details on how to extract the control flow graph by attribute computation

One messy glitch are computations like "goto @" (GCC allows this). To do this right, you need to compute data flow for label variables (which technically require you to construct a CFG first :-( ). You can avoid doing the data flow by constructing a conservative answer: "goto @" can go to any label whose address is taken in the function which the goto is found. You can compute this with an attribute grammar, too.

For languages which have only structured control flow, you can actually implement data flow analysis directly by attribute grammar.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • First of all, thank you so much for the detailed and usefull answer. I think i have understood what you said. However i have some doubts about the implementation of the scope and name resolution. I think on it as a data structure for each scope and then inside it the constructions that belongs to that particular scope. I can extract its particular scopes because the structure of the AST as you say in the answer. I am maybe lost? – helloStack May 30 '16 at 18:32
  • 1
    Each scope is abstractly a mapping from an arbitrary set of program locations to a set of identifier definitions defined within that scope. In many languages (maybe yours, maybe not), the "set of program locations" for a scope always corresponds to a subtree of the AST; in such languages scopes tend to nest lexically. (Languages with "namespace" constructs break this rule for instance namespaces). For such languages, it is easy to identify the scopes; they correspond to tree nodes like "parameter list", "block body", ... I don't think you are lost. – Ira Baxter May 30 '16 at 18:48
  • The language that I am parsing is C, which I think that respects the rule that you are saying. Thank you again for the answer, are helping me a lot. – helloStack May 30 '16 at 19:46