1

I'm taking the Nand-2-Tetris course. We are asked to write and assembler. A C-command is in the type of dest=comp;jump where each part is optional.

I was trying to write a regex to make everything easier - I want to be able to compile the expression on a given line, and just by the group number, know which part of the expression I'm using. For example, for the expression: A=M+1;JMP I want to get group(1) = A, group(2) = M and group(3) = JMP.

My problem is that each part is optional, so I don't know exactly how to write this regex. So far I come up with:

(A?M?D?)\s=([^;\s]*)\s?(?=;[\s]*([a-zA-Z]{1,4})|$)

This works for most cases, but it doesn't work as I expect it. For example, lack of comp won't work (D;JGT). I have tried positive lookahead but it didn't work.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Dvir Itzko
  • 404
  • 4
  • 17
  • An assembler is a _parser_, and while regex may certainly be a tool you use, it is not the only concern. – Tim Biegeleisen Oct 02 '16 at 07:59
  • 2
    Agreed. This looks like a case of "If all you have is a hammer, every problem looks like a nail". Regex is a powerful tool, but using it for this kind of task comes at the cost of your time and that of anyone who has to read your code. – MadOverlord Oct 02 '16 at 11:45

2 Answers2

1

The RegEx that you are looking for is as follows:

(?P<dest>[AMD]{1,3}=)?(?P<comp>[01\-AMD!|+&><]{1,3})(?P<jump>;[JGTEQELNMP]{3})?

Let's break it down into parts:

  • (?P<dest>[AMD]{1,3}=)? - will search for optional destination to store the computation result in it.
  • (?P<comp>[01\-AMD!|+&><]{1,3}) - will search for computation instruction.
  • (?P<jump>;[JGTEQELNMP]{3})? - will search for optional jump directive.

Do note, that dest and jump parts of every C-Instruction are optional.
They only appear with postfix = and prefix ; respectively.

Hence, you will have to take care of these signs:

if dest is not None:
    dest = dest.rstrip("=")

if jump is not None:
    jump = jump.lstrip(";")

Finally, you will get the desired C-Instrucion parsing:

For the line A=A+M;JMP you will get:

dest = 'A'
comp = 'A+M'
jump = 'JMP'

For the line D;JGT you will get:

dest = None
comp = 'D'
jump = 'JGT'

And for the line M=D you will get:

dest = 'M'
comp = 'D'
jump = None
Yair
  • 58
  • 6
0

Not quite sure what you want to do, but based on your examples you can make a regular expression like this:

([\w]+)[=]?([\w])*[+-]*[\w]*;([\w]+)

Then for that line:

A=M+1;JMP

You'll get the following:

Full match  A=M+1;JMP
Group 1     A
Group 2     M
Group 3     JMP

And for that line:

D;JGT

You'll get:

Full match  D;JGT
Group 1     D
Group 3     JGT

See example here: https://regex101.com/r/v8t4Ma/1

  • This is still not good enought. I also need to be able to write `A=M+1` and it should work. and also, I also need to support operations such as !M (negate) -M etc. this is why in my regex I didn't specified which chars are premitted for comp, I just made sure I'm note catching something I don't want to (`[^;\s]`) – Dvir Itzko Oct 03 '16 at 08:00