0

In school we learned about the ARC assembly language. It was used in the book "Principles of Computer Architecture" by Miles Murdocca to teach Computer Architecture. An ARC Programm looks like this:

!
!  A simple ARC program to add two numbers
!
        .begin
        .org 2048
 main:  ld [x], %r1            ! load x into %r1
        ld [y], %r2            ! load y into %r2
        addcc %r1, %r2, %r3    ! %r3 <- %r1 + %r2  
        st %r3, [z]            ! store %r3 into z
        halt                   ! halt simulator
        jmpl %r15+4, %r0       ! standard return
 x:     15
 y:     9
 z:     0
        .end

I want to hand write a parser for the language but struggle to apply my basic knowledge about parsers to an assembly language. For example I can't wrap my hand around the abstract syntax tree of an assembly program.

Can someone point out the differences in parsing high-level languages and an assembly language or assembly code?

phuclv
  • 37,963
  • 15
  • 156
  • 475
LuMa
  • 1,673
  • 3
  • 19
  • 41

1 Answers1

3

Parsing is not fundamentally different. You still have tokens, labels, syntax etc.

The key difference you may find is that most higher languages support deeply nested expressions which results in deeper syntax trees (think nested loops, anonymous functions etc.).

In assembly, code is generally structured more like a simple list of instructions, one level deep, with a fairly close correspondence to the underlying machine instructions. As a result, you may find it is possible to express the syntax "tree" as a single flattened list.

mikera
  • 105,238
  • 25
  • 256
  • 415
  • Are there any nestable/context-sensative instructions in assembly (for common archs like x86, ARM64, etc.)? – Alexander Mar 01 '17 at 04:45
  • Many are context sensitive (flags, register state etc.) but to my knowledge none are truly "nested" (in the sense of being high order constructs that may contain blocks of arbitrary other instructions) – mikera Mar 01 '17 at 05:01
  • Yeah I can't think of any either, but there are a *lot* of asm instructions I've never worked with. – Alexander Mar 01 '17 at 05:01
  • 1
    We should of course distinguish between machine code and assembly languages. It is certainly possible for assembly languages to contain higher level nested features (macro definitions etc.), even if the underlying machine code doesn't. – mikera Mar 01 '17 at 05:06
  • Yeah, I'm aware of those. – Alexander Mar 01 '17 at 05:13
  • @Alexander - `are there any nestable/context-sensative instructions in assembly`, not instructions, but MASM (ML.EXE) 6.0 and later versions include `dot` directives such as .if .else .endif .while, ... and those directives can be nested, although I'm not sure how this would affect parsing. Expressions for operands can be nested using parenthesis. – rcgldr Mar 01 '17 at 08:32
  • @Alexander: Don't forget that most assemblers accept expressions - e.g. you might write `ld [x+(4*1234)/7-33 + 1<<5], %r1` and the assembler would figure out the value for the expression "(4*1234)/7-33 + 1<<5" (possibly after substituting macros/defines with numbers if you actually wrote `ld [x+(FOO*1234)/BAR-33 + 1< – Brendan Nov 04 '18 at 18:15
  • @Alexander: Yes, ARM Thumb has an `it` (if-then) instruction (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cjabicci.html) which predicates later instructions. So to parse and verify the THEN/ELSE match between the ITTE and the predicates on the later instructions, the assembler has to keep state across source lines when parsing something like `itte eq` / `moveq r0, r1` / `addeq `r0, r4` / `addne r1, #1`. [itte in arm assembly](https://stackoverflow.com/q/7042289) – Peter Cordes Nov 05 '18 at 00:30
  • In "unified" syntax, you'd omit the itte, and the assembler will automatically emit an `itxxxx` instruction before a contiguous group of predicated instructions that all use the same condition, or its inverse. So a one-pass assembler is impossible without at least some data structure for predicated instructions, unless you make it emit terrible machine code with a separate `it` for each instruction. – Peter Cordes Nov 05 '18 at 00:34