0

I am implementing a compiler using Parsec/Megaparsec as parser. I can't find a way to overload operator +, which I want to use for both integer addition and string concatenation. Is it possible?

Carl
  • 26,500
  • 4
  • 65
  • 86
sinoTrinity
  • 1,125
  • 2
  • 15
  • 27
  • 1
    It's possible, but it needs *far* more information than you've provided. Can you ask a concrete question? Something along the lines of "this is a small self-contained representative of the code I'm having trouble with, these are the exact problems I'm having." – Carl Jul 04 '19 at 00:51
  • 1
    You could parse raw syntax & have a typechecking phase afterwards. – gallais Jul 04 '19 at 09:03
  • One simple way to do it is to make String an instance of the Num class and define the class method (+) to be (++). But then the compiler will expect implementations of all the methods of Num. Also a quick test showed that you need to turn on 'Flexible Instances'. I'm not an expert, but it feels very unnatural to me. – Mark Wildon Jul 04 '19 at 17:24

1 Answers1

1

Start by writing your compiler to use a different operator for string concatenation, like @. When you've got it running and well tested, take a look at your code. You'll probably discover that one of two things has happened, depending on the language you're parsing and the compiler architecture you've used.

The first possibility is that the part of your parser that parses + is completely separate from the part of the parser that parses @ (e.g., they are in two different parser functions, one for parsing numeric expressions and a separate one for parsing string expressions). If this has happened, congratulations, you just need to replace "@" with "+", run a few tests, and you should be good to go.

The second possibility is that + and @ are parsed in the same place and produce AST nodes with different constructors:

data Expr ... =
  ...
  | Plus Expr Expr     -- the '+' operator
  | Concat Expr Expr   -- the '@' operator
  ...

In this case, you probably also have some part of your compiler that's generating code (and hopefully some type information):

codeGen (Plus e1 e2)
  = do (code1, typ1) <- codeGen e1
       (code2, typ2) <- codeGen e2
       if (typ1 == Num && typ2 == Num)
         then genPlus code1 code2
         else typeError "'+' needs numbers"
codeGen (Concat e1 e2)
  = do (code1, typ1) <- codeGen e1
       (code2, typ2) <- codeGen e2
       if (typ1 == Str && typ2 == Str)
         then genConcat code1 code2
         else typeError "'@' needs strings"

In this case, you should modify the parser/AST to collapse the AST to just one shared constructor:

data Expr ... =
  ...
  | ConcatPlus Expr Expr     -- the '+' operator for numbers and strings
  ...

and handle both cases in the code generator, depending on available type information:

codeGen (ConcatPlus e1 e2)
  = do (code1, typ1) <- codeGen e1
       (code2, typ2) <- codeGen e2
       case (typ1, typ2) of
           (Num, Num) -> genPlus code1 code2
           (Str, Str) -> genConcat code1 code2
           _ -> typeError "'+' arguments must have same type (numbers or strings)"

If your compiler doesn't look like these examples, you'll have to post some code so we know what it does look like.

K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71