Binary and unary minus operator in Lexical Analyzer

Question

So, I am doing a lexical analysis of a TOY programming language using flex. I am currently stuck at the following point.

Minus sign: As we know the minus sign can have two meanings by defining them as binary or unary operators (I also know you can discard two meanings and just say that -2 is the same as 0-2). Firstly, I have only studied lexical analyzers as of now and I don't know anything about parsers. So, should I care about distinguishing these two minus signs, becomes sometimes the analyzer will print -2 as a Numeric literal and sometimes - as an operator and 2 as a numeric literal? Or can this kind of clubbing of two literals be done during parsing? If yes, then should I only define positive numbers and - as a unary operator?

During lexical analysis, you don't need to care about the meaning of lexemes. Just define `-` as `minus`. — Simon Smith, Jan 20 '23 at 07:00
Does this answer your question? [In lex, how do I differentiate between '-' (subtraction) operator and an integer '-3'?](https://stackoverflow.com/questions/35386695/in-lex-how-do-i-differentiate-between-subtraction-operator-and-an-integer) — Piotr Siupa, Jan 20 '23 at 14:49
I know that C++ compiler implements a negative integer as `-` + `positive integer`. This causes some problems when you try to write very big negative integers. However, I see no reason why a parser couldn't just concatenate an unary `-` with the rest of the integer. — Piotr Siupa, Jan 20 '23 at 14:57

score 2 · Answer 1 · answered Jan 20 '23 at 07:22

Things get really complicated if you try to analyse signed integers as single tokens. For example, you will have to avoid analysing x-1 as { x, -1 }, since that - is an operator.

If you analyse signed integers as two tokens -- a sign and then an integer -- then you will get a sensible parse. There are only two issues:

You really don't want to evaluate -1 at runtime. It's a constant value, and it should be evaluated at compile time. But that's no different from any other constant expression. (1+1) should ideally be folded to 2 at compile time. Constant folding is a relatively easy optimization to implement, although you should get things working first.
In two's complement, the smallest negative integer (-2147483648 if you're using signed 32-bit integers) is integer overflow if the integer part is separated from the sign. 2147483648. There are a number of ways of dealing with this, most of which lead to some kind of parsing quirk. For example, a C compiler targetting a 32-bit int will assign a wider type than int to 2147483648, and that means that -2147483647 and -2147483648 are unexpectedly different types. (That's why INT_MIN is usually #defined as (-2147483647 - 1).) On the other hand, if you just assume that the compiler is running in a 2's-complement environment with integers wrapping around, then you'll get the right value for -2147483648. But you'll get the same value for 2147483648, which should be flagged with a diagnostic. For a student project with a toy language, these quirks are probably innocuous, but you should make some attempt to document them.

I was also thinking the same thing. Thanks for confirming it. :) — Deepak Sangle, Jan 20 '23 at 07:35

Binary and unary minus operator in Lexical Analyzer

1 Answers1