0

I'm writing a virtual machine that directly executes my own assembly code. Here's an example below:

add r1, r2, r3 ; Add the values of reg2 and reg3 and store the result in reg1

As you can see the instructions are laid out as like this:

INSTRUCTION PARAMETERS OPTIONAL_COMMENT

Would it be best to use regular expressions to parse this or would it be ok to parse it line by line and split each line up by spaces?

Seki
  • 11,135
  • 7
  • 46
  • 70
user3318845
  • 301
  • 4
  • 15
  • 2
    This isn't clear; how would one use a regex to interpret something? – Oliver Charlesworth Mar 08 '14 at 19:58
  • @OliCharlesworth Sorry about that, I changed the question, I'd like to know if it's better to parse using regex or convert each line to an array and then split the array at each space to isolate each part of the instruction? – user3318845 Mar 08 '14 at 20:00
  • 1
    You might want to consider using a lexer and parser to do this. I don't know if that'd be too complex for what you're doing, though. – kabb Mar 08 '14 at 20:06
  • Don't you also want labels and assembler directives? – pat Mar 08 '14 at 20:11
  • @kabb I'm just concerned that by using a lexer and parser would create too much overhead in the VM. I want it to be as lightweight as possible. – user3318845 Mar 08 '14 at 20:12
  • 1
    You should really consider parsing the input only once, and converting it to byte code which will be far more efficient (you are basically re-inventing python). – pat Mar 08 '14 at 20:14
  • @pat I do plan on having assembler directives, I'm just not sure about the best way to parse the assembly because I think using something like lex and yacc would be overkill for simple instructions. Please correct me if I'm wrong or if you know of a better way to do it though. – user3318845 Mar 08 '14 at 20:15
  • @pat I thought that custom assembly was byte code, well byte code that hasn't been encoded as binary or hexadecimal. Am I wrong? – user3318845 Mar 08 '14 at 20:16
  • 1
    Do you plan to execute the instructions as you parse them? If so, any instruction that gets executed more than once (like in a loop) is going to have to be parsed multiple times. If you instead convert the text into byte code ahead of time, then your execution engine only has to look at the byte code, and consists of just a big switch statement. – pat Mar 08 '14 at 20:21
  • I was planning on executing instructions as the VM see's them but now that I think about it, it's probably better to use byte code. The problem with byte code is that I have no idea how to encode strings for example, would I do it like this: 0001 = a, 0002 = b or is there a better way than that? Thanks for your answers by the way! – user3318845 Mar 08 '14 at 20:24
  • Take your add example above. As assembly, that line takes 78 bytes (including the newline). If you execute this directly, you will have to read 78 bytes each time. With byte code, you could get it down to 1 or 2 bytes. – pat Mar 08 '14 at 20:24
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/49325/discussion-between-pat-and-user3318845) – pat Mar 08 '14 at 20:25
  • 1
    (I wonder how this ended. Funny how OP equates "regular expression" with "lightweight".) – Jongware Jun 12 '15 at 13:19

0 Answers0