formal method to do semantic analysis in compiler

Question

I know there is a formalism called attribute grammar,and a non-formalism method called syntax-directed translation,but the first is inefficient and the latter one is difficult to automate.

Does there exist other recent formalism about semantic analysis?

This question appears to be off-topic because it is not practical, answerable question within the scope of http://stackoverflow.com/help/on-topic. Similar vague and abstract questions might receive some attention at [Computer Science Stack Exchange](http://cs.stackexchange.com/help/on-topic) site — xmojmr, Jan 15 '15 at 08:12
I continue to be stunned by some the reactions from SO people that insist on closing questions because they don't happen to know an answer. Of all the questions one might ask in SO, questions about "formal semantics" are the *least* likely to be *vague*. And, the problem of computing such semantics for a program *is* a perfectly valid "programming problem"; OP clearly wants to compute them *efficiently*. I provide exactly a *practical* answer to this question here — Ira Baxter, Jan 15 '15 at 08:45
@IraBaxter or worse, they don't know enough to *understand* the question. A few weeks back someone asked about "warming a server", a significant concern for anyone running high-traffic web farms. Not only was the question closed, the OP was ridiculed until he deleted the question, before anyone knowledgable could answer — Panagiotis Kanavos, Jan 15 '15 at 08:59
@PanagiotisKanavos: I was polite in my interpretation of the reasoning of the closers. — Ira Baxter, Jan 15 '15 at 09:01
@IraBaxter I'm not a close-voter on this question, but I don't agree. There may well be products out there that do this, but if they constitute an answer, the question must be a request for an off-site resource: otherwise it is just too broad. Speaking as an alumnus of UCSC Compiler Construction '79. — user207421, Jan 15 '15 at 09:50
@EJP: If the question was, "is it possible to compile and run C# efficiently", how would you answer in any way other than an offsite reference? "Yes" is not an opinion but it is also not an an adequate answer. You could leave out the actual link; all that does is make the answer hard to use. [SO policy discourages offsite references; I think that silly but gave up arguing it in Meta. But it isn't black or white.] — Ira Baxter, Jan 15 '15 at 10:04
@EJP: You are welcome to provide an alternate answer, as an alumnus. — Ira Baxter, Jan 15 '15 at 10:07

Ira Baxter · Answer 1 · 2015-02-05T09:23:48.040

OP suggests "attribute grammars" are inefficient, and syntax-directed translation is difficult to automate. I offer a proof-point showing otherwise, name a few other semantic systems, and suggest how they might be integrated, below.

Our DMS Software Reengineering Toolkit supports both of these activities and more.

It provides parsers for full context free grammars, and the ability to define, compile, and executed in parallel attribute grammars with arbitrary data and operations, and arbitrary flow across the syntax nodes. One can compute metrics, build symbol tables, or compute a semantic analysis with such attribute grammars.

Given a (DMS) grammar rule:

  LHS = RHS1 ... RHSN ;

one writes a DMS attribute grammar rule for a named attribute grammar computation Pass1 (for practical reasons, there can be many different passes, some even building one another's results) in the form:

  <<Pass1>>:  {  LHS.propertyI=fn1(RHSx.propertyY,...);
                 ...
                 RHSa.propertyB=fn2(RHSp.propertyQ,...);
                 ...
              }

for a set of (arbitrary type) properties associated with each grammar element, either on the left or right hand side of the grammar rule, using arbitrary functions fnI defined over the types involved, implemented in the DMS's underlying (parallel) language, PARLANSE. DMS computes the dataflows across the the set of rules, and determines a partial order (parallel) computation that achieves the computation, and compiles this into PARLANSE code for execution. The result of an attribute computation is a tree decorated with the computed properties.

With care, once should be able to define a denotational semantics of a language computed by an attribute grammar. One of the key notions in DS is that of an "environment", which maps identifiers to types and possibly symbolic values. (The former is traditionally called a symbol table). At AST nodes that introduce new scopes, one would write an attribute function that created an new environment by combining the parent environment with newly introduced identifiers, and pass that down from the AST node to its children, e.g., for the rule

exp = 'let' ID '=' exp1 'in' exp2;

one might code an attribute grammar rule:

<<Denotation>>: {
     exp2.env = augment_environment(exp.env,
                                    new_variable_and_value_pair(name(ID.),
                                                                exp1.value));
     exp.value=exp2.value;
               }

I'm not sure what the OP means by (attribute grammars are) "inefficient". We've used DMS attribute grammars to compute semantic properties (name and type resolution) for all of C++14. While such a definition is huge by most academic paper standards, it is that way because C++14 itself is huge and an astonishing mess ("camel by committee"). In spite of this, our attribute grammar seems to run well enough. More importantly, it is powerful enough for a very small team to build it (in contrast to the scale of "team" supporting Clang).

DMS also provides the ability to encode source-to-source transformations ("rewrites") using the surface syntax of the source and target (if different than source) languages, of the form, "if you see this, replace it by that". These rewrites are applied to the parse trees to provide revised trees; a prettyprinter ("anti-parser") provided by DMS can then regenerate source code for the target language. If one limits oneself to rewrites that exactly tile the original AST one gets "syntax-directed translation". OP might claim this (syntax directed translation) is difficult to automate; I'd agree but the work is done and available. OP does have to decide what rules she wants to define and execute.

DMS rewrite rules take the form:

 rule rule_name(parameter1:syntax_category1, ... parameterN...)
   :  source_syntax_category -> target_syntax_category
   "  <text in source language>  "
  ->
   "  <text in target language> "
  if  condition_of_matched_source_pattern;

where the parameters are placeholders for syntax-typed subtrees, the rule maps a tree of type source_syntax_category -> target_syntax_category (often the same one), and the "..." are meta-quotes wrapped around surface syntax with "\"-labelled embedded escapes for the parameters where needed. The meta-quoted code fragments are interpreted as specifications for trees (using the same parsing engine that reads the source code); this is not a string-match. An example:

  rule simplify_if_then_else(c:condition,t:then_clause,e:else_clause)
     statement->statement
  =  " if \c then \t else \e "
  -> " \t "
  if c == "true";

A generalization of the (above purely syntactic) check) which is more "semantic" would be

  ...
  if can_determine_is_true(c);

which assumes custom predicate that consults other DMS-derivable results to decide the instantiated condition is always true at the point where it is found (the matched tree c carries its source position with it, so the context is implied). One might build control and data flow for the desired language, and use the resulting dataflow to determine values that arrive at the condition c, which may then always turn out to be "true" in a nontrivial way.

I have assumed a DMS-defined support predicate "can_determine_if_true". This just a bit of custom PARLANSE code.

However, since the rewrites transform one tree into another tree, one can apply an arbitrarily long/complex set of transformation rules repeatedly to the entire tree. This gives DMS rewrites the power of a Post (string [generalized to tree]) rewriting system, thus Turing capable. You can technically produce any arbitrary transformation of the original tree with sufficient transforms. Usually one uses other features of DMS to make writing the transforms a bit easier; for instance, a rewrite rule may consult the result of a particular attribute grammar computation in order to "easily" use information from "far away in the tree" (individual rewrites rules always have a fixed, maximum "radius").

DMS provides a lot of additional support machinery, to help one construct control flow graphs and/or compute dataflow with efficient parallel solvers. DMS also has a wide variety of available front ends for various langauges such as C, C++14, Java1.8, IBM Enterprise COBOL, ... available so that a tool engineer can concentrate on building the tool she wants, rather than fighting to build a parser from scratch (only to discover that one must live Life After Parsing).

If OP is interested in an recent overview of another style of (structured operational) semantics, he might consult course notes for Semantics of Programming Languages. We claim the techniques in such papers can be implemented on top of DMS if one likes.

One can make a long list of various academic tools that implement (some) of these ideas. Most of them are research tools and not mature. One such research system, JastAdd is an attribute grammar evaluation system, and I hear that it stands out in capability and performance, but I have no specific experience with it.

Where is the answer to the original question in all this? Why shouldn't this be flagged as spam? It's not copy pasted, 10% of sentences seem to answer the question but they are spread all over. *Is* there another formalism or not? If yes, how is it called and what does it do? — Panagiotis Kanavos, Jan 15 '15 at 08:56
OP opens by objecting to two existing styles; I show his objections are not as much of an issue as he suggests. I discuss other semantic styles [granted, briefly] *and* how they fit in. When I write more general answers (without the "programming code" examples), people object that such answers don't fit the spirit of SO. I could have left the examples off. — Ira Baxter, Jan 15 '15 at 08:59
Try highlighting the sentences where you do that, then check how much of the text is essentially an advertisement about some product. The last paragraph points to an article without specifying details and the 3rd paragraph says that "inefficient" isn't true in real-life scenarios. All the rest is about the product, not the question. — Panagiotis Kanavos, Jan 15 '15 at 09:03
Most of the bulk of this is providing evidence of a *real* solution as opposed to an imagined problem. Blanket claims that something is real without evidence is sure-fire way to get the answer rejected. Yes, the result then reads like "this is a good tool because it does good things". I'd be happy to offer other evidence but there isn't a lot of it around, for which I can offer convincing details. I can answer these questions because I built this thing with personal blood sweat and tears over the last 20 years. — Ira Baxter, Jan 15 '15 at 09:48

formal method to do semantic analysis in compiler

1 Answers1

Linked