1

If I have a language with basic things such as

a = expression

if expression then ...

while expression do ...

then I might have a grammar that looks like this: (pseudo code)

assignment: identifier Equals expression ;
if        : If expression Then ...
while     : While expression Do ...

Now the type of the first expression just needs to be type compatible with 'a' but the other two expressions must be of type boolean.

While it's easy to check the type of an expression everywhere, it seems to me that it would be very convenient to define in the grammar

boolExpression : expression;

and then my other rules would look like this:

assignment: identifier Equals expression ;
if        : If boolExpression Then ...
while     : While boolExpression Do ...

This would allow me to check that boolExpression returns a BOOL type and so I wouldn't have to add code to test every expression.

But have I turned the context free grammar into a context sensitive grammar by doing this? And if so, does it matter?

David
  • 5,991
  • 5
  • 33
  • 39

2 Answers2

2

If you have boolean variables, then you cannot syntactically verify whether an expression is boolean or not. If you want to catch all type checks, you will need some kind of semantic feedback, which will indeed make the language context-sensitive, and also make left-to-right parsing impossible unless you require variables to be declared before use. [See note 1] Semantic feedback of this form is generally considered to be a complication, since it produces a tight coupling between lexical analysis and parsing, but many parsers do it, either for convenience or necessity.

But perhaps you are content with detecting only certain contexts. If so, your grammar is certainly workable and does not create context sensitivity. if and while statements definitely provide a syntactically-determined boolean context. Arithmetic operators provide a determined numeric context (although you might need type checking if you have multiple arithmetic types.) But there are contexts which are not type-determined syntactically, such as assignment statements, equality operators, and function calls. So you will end up doing special-case type checks in various places in the walk.

Personally, I don't see the advantage to trying to complicate the grammar to try to do partial type-checking; I prefer to do a type validation/deduction pass after the parse. But in some use cases, a less flexible approach is perfectly reasonable.

Notes

  1. Some languages are very firm about declaration before use (C, for example) but others allow the programmer to logically order declaration of variables without having to worry about references to them (C++ class declarations, for example.)

    Personally, I prefer to not complicate the programmers life by insisting on declaration before use. Indeed, as a programmer I appreciate languages which can deduce variable types rather than insisting on redundant declarations. But tastes differ. In any event, mandatory declaration does not necessarily imply declaration before use, and many languages with mandatory declaration still provide flexibility about declaration order, allowing (for example) mutually recursive functions without redundant forward declarations.

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • I do require variables to be declared before use. It simplifies the programmer's life because it prevents the use of a misspelled variable. The benefit (or not) of type inference is beyond the scope of my question. Defining a booleanExpression non-terminal allows me to put the type-checking code in a single place, the visitBooleanExpression method of the visitor tree rather than having to add extra code in every visitor method that uses an expression. That seemed quite beneficial hence my question as to whether making the grammar context-sensitive in this way really matters in practice. – David Feb 04 '18 at 14:13
  • @david: And I tried to indicate the ways that it makes a difference in practice. You are certainly free to decide these differences don't affect your use case. (I disagree about declaration before use; C++ also produces errors for misspelled variables without preventing me from gathering all member variable declarations in one place. But it's a question of style.) – rici Feb 04 '18 at 15:45
  • @david: However, reading your comment makes me think that I might have misunderstood your intentions. Or at least not communicated fully my thoughts. I tried to rewrite the answer in line with that reconsideration. – rici Feb 04 '18 at 16:25
  • I'm implementing a reasonably simple strongly typed language. Since the parser uses ANTLR4 to build the tree, the only difference between using 'expression' and 'boolExpression' is that I get separate visit methods and so when I'm traversing the tree doing semantic analysis and code generation, I can check in one place that the actual type found is a boolean rather than having to insert that checking code in all the places where I refer to an expression but require the type to be boolean. – David Feb 04 '18 at 17:40
  • @david. ok. Go for it. – rici Feb 04 '18 at 20:47
  • I just saw your complete post - don't know why I didn't see it when I was responding. I'm not doing any type checking during the parse phase (in fact I love that ANTRL4 lets you defer ALL of that until you visit the tree later, leaving you with a 100% clean grammar). The parse is still context free. I was just taking advantage of using separate non-terminals so that I would get separate visitor methods so that I could consolidate my type checking but wondering if I was "breaking" the theory. – David Feb 04 '18 at 23:22
1

No, those changes do not turn your context-free grammar into a context-sensitive grammar.

You can tell the difference just by looking at the left-hand-sides of the productions: as long as every LHS is a single symbol, it's a CFG. If any LHS has multiple symbols, then it's a CSG.

(That's a bit of an oversimplification, but it's accurate enough to answer your question.)

That's grammars, but when it comes to languages, note that the language your grammar generates is different from the language of type-valid programs. The former is context-free, but the latter probably isn't.

Michael Dyck
  • 2,153
  • 1
  • 14
  • 18
  • Duh - of course - serves me right for not having looked at my compiler books for 30 years. I completely forgot the meaning of context-sensitive. Thanks to both you and rici for your very helpful comments – David Feb 04 '18 at 23:24