Better to be more explicit or less explicit in a parsing grammar?

Question

Let's say I have a SQL-like language that supports seeing if two expressions are equal only if they are the same type, and if they're not the same type it'll raise an error. Examples would be:

1 = 1         # true
1 = 1.2       # false
1 = '1'       # error
1 = '1'::int  # true

What would the best production for this be?

EQ:      expr '='  expr

Or something much more detailed, which would attempt to catch type errors at the lexing(parsing?) stage, such as:

EQ:      numeric_expr '=' numeric_expr
       | string_expr  '=' string_expr
       | ...etc

I know of only one grammar like this, [the algol60 grammar](https://github.com/antlr/grammars-v4/tree/master/algol60), which comes directly from the 1960's spec. In that grammar, boolean is treated with extra rules. The problem is that it doesn't solve the problem if you add expressions with variables. Then, you would have to add code to compute types into the grammar, complicating the grammar and error recovery. People wised up and don't do that anymore. But, this is a great question. They don't teach people how to write grammars because industry doesn't believe in grammars let alone specs. — kaby76, Aug 07 '22 at 11:44

score 2 · Accepted Answer · answered Aug 07 '22 at 04:44

SO is notoriously hostile at questions which ask for an opinion. But in this case, I think there is an objective preference, so I'll hazard offering an answer.

You cannot, in general, detect a type mismatch in a context-free grammar, for the simple reason that the type of a variable (or fieldname, or whatever) is not part of its syntax. Looking up the type of a variable is, pretty well by definition, not context-free.

Of course, you can do some very limited typechecking on constant expressions, but that's not a particularly interesting case. Most SQL queries do not involve comparing two literals values.

Trying to classify expression productions syntactically by type will almost certainly lead to grammar conflicts. But even if you somehow manage to do it, you will then have to try to construct a sensible error message, since flagging a type mismatch as "Syntax error" is highly misleading to the programmer. And producing good error messages during a parse is much harder than you might think.

By contrast, it is very easy to catch type errors by analysing the parse tree created by the parser, at least if there is some mechanism to identify the type of a named object. Furthermore, if you are using a typechecker to check for type mismatches, rather than a general-purpose parser, it is almost trivial to produce a meaningful error message.

So it's not really a question of "explicitness". It's a question of detecting errors at the correct point during program analysis.

I agree. The border between syntax and semantic is wide and clear here. You can tell by how bloated the code checking the types would be. This is a clue that your doing something a syntax parser is not good at. — Piotr Siupa, Aug 07 '22 at 08:42

Better to be more explicit or less explicit in a parsing grammar?

1 Answers1