Language syntax evolution and its semantics preservation

Question

I am investigating how a syntactic evolution of language affects its semantics. For instance, java's for loop syntax evolved in its version 5 to a compact one. Do the designers have to prove that even with this syntax the semantics are still preserved! May be this is a trivial example.

So, in general, how can one prove that a language's semantics are still preserved even when its syntax has evolved from very verbose to compact?

Many thanks in advance for any insights/links.

Ketan

So you're asking how one can prove that syntactic sugar is implemented properly? o.O — , Sep 23 '10 at 19:01
well, my example was about syntactic sugar, but what if there is a significant transition. Let me try another example; say from a procedural style syntax to an array programming style, for example C = A + B, where A,B, and C are arrays and the semantics here are to add the contents at the corresponding indice of arrays A and B and load them to that of C. — Ketan Maheshwari, Sep 23 '10 at 19:12
I still don't understand what you want a proof for: That the concrete implementation is correct? That the definition in the language spec is in synch with e.g. the mathematical definition? That the simplified describtion in the beginner's tutorial is accurate? ...? — , Sep 23 '10 at 19:17
I'm struggling with the procedural to array style change. What were the semantics in the old style that were to be preserved - isn't this introducing a new syntax along with some new semantics (at least, this was the case in the FORTRAN77 -> Fortran90 change just like this: the array style used to a semantic-less syntax error). — Andrew Walker, Sep 23 '10 at 19:21
Some more specific details: we have an interpreter that understands language A in a very verbose syntax. Now, we invented a new language B with a very compact syntax that is a complete departure from that of A. So, a user can now write the code in compact language B, translate to the verbose language A using a translator program that I have written. The problem is how to prove/guarantee that *all* possible such translations will *preserve* the semantics of the original language A that the interpreter understands. — Ketan Maheshwari, Sep 23 '10 at 19:23
Andrew, I am not aware of the FORTRAN thing, will look it up. But yes, in our verbose language we have a longer way to represent array operations that are now possible to accomplish with this syntax. We do checks to make sure the arrays in question are compatible: dimension, member types, length. — Ketan Maheshwari, Sep 23 '10 at 19:27
IMO you are trying to prove the wrong thing. What you acutally have to prove is that the result of your translation does what it is expected to do, according to the specification of B, the translation rules and the specification of A. There may be many aspects of A that are unreachable in B. Consider translating C++ into assmbler code. That's what every C++ compiler does (more or less). Assembler code can do things that cannot be done safely in C++, like calculated jumps or self-modifying code, so you could not prove that "the semantics of assembler are preserved in C++". — Erich Kitzmueller, Sep 23 '10 at 21:25
ammoQ, What you are saying is the exact same thing : "...to prove is that the result of your translation does what it is expected to do ..." which means to prove that the semantics are preserved while translation. The other point is also relevant that A is a superset of B and that is ok. I know that to be a case for most languages. Many thanks for your input. It builds up on my arguments. — Ketan Maheshwari, Sep 24 '10 at 09:15

score 2 · Accepted Answer · 2012-11-11T00:12:25.757

Okay, your last comment is much more answerable.

Some more specific details: we have an interpreter that understands language A in a very verbose syntax. Now, we invented a new language B with a very compact syntax that is a complete departure from that of A. So, a user can now write the code in compact language B, translate to the verbose language A using a translator program that I have written. The problem is how to prove/guarantee that all possible such translations will preserve the semantics of the original language A that the interpreter understands.

The short answer is: You don't. For one thing, when you add syntactic sugar you usually just capture a well-known, wide-used pattern and give it special, nicer syntax - you don't replace large parts of the language's syntax. For such small replacements, the translation can be formulated with informative descriptions and examples - for example, PEP 343 defines the "with" statement relatively informatively.

Now, when the change in syntax is so radical the new language has hardly anything in common with the backend language, we're not talking about change of syntax - we're talking about a compiler. But compilers aren't proven correct either. Well, some people actually try it. But for real-world compilers, this rarely happens; instead testing checks the correctness, by countless users and their programs. And of course, all serious language implementations have a wide range of test cases (read: example programs, from basic to absurd) that should run and pass (or in some cases, generate an error) at least in official releases. When they do (and the test suite is worth its salt), you still don't know that there are no bugs, but at least it's some confidence. As Dijkstra said: "Testing shows the presence, not the absence of bugs."

delnan, Many thanks for this nice, informative and insightful post. This presents a very nice argument to the whole problem. I never realized what I am calling a translator is indeed a form of compiler! — Ketan Maheshwari, Sep 23 '10 at 19:50

score 0 · Answer 2 · answered Sep 23 '10 at 19:22

Prove that every syntax extension would have been illegal in older versions of the language.

For obvious reasons, new syntactical elements should introduced in a way that would have been syntactically illegal in the older version of the language. Because of that, most languages have a list of reserved words that goes beyond the keywords already in use.

For example, when C# introduced the var keyword in version 3.0, it's potentially problematic since var was not a reserved word in version 1.0 of C# (possibly not in 2.0, either). So a program could have legaly created a type called var in C# 1.0, but it no longer compiles in C# 3.0 and later.

Unchanged semantic of the old language elements is more a matter of how the compiler is built, since the specification rarely changes. If it changes, well, then the semantics are not preserved. The exception is when new specifications that fix things that would have constituted undefined behaviour (but are still legal) in previous versions of the specification. C comes to my mind.

Nah, that's just backwards-compability/proving that the semantics of he old syntax doesn't change. — , Sep 23 '10 at 19:25
delnan is right. What if C# introduces a completely new way to express loops/conditionals, implicit operators overloading. Is it possible to prove that the two syntaxes are equivalent? BTW, to me, "syntactic equivalence" is synonymous to "semantics preservation" for one interpreter and 2 language syntaxes. — Ketan Maheshwari, Sep 23 '10 at 19:36
OK, looks like I misunderstood your question. How can I prove that a foreach loop does the same like a for loop? By specifying it that way! — Erich Kitzmueller, Sep 23 '10 at 21:28

Language syntax evolution and its semantics preservation

2 Answers2