2

I know that i can not write assembly language that will run/compile on all machines because they have different instruction sets,opcodes,registers etc. My question is, even though the instruction set would be different, is the assembly syntax (or the language it self) the same for any architecture?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
eleethesontai
  • 454
  • 5
  • 19
  • 2
    Assembly is an abstract term describing any low-level programming language in which there is a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language for the Motorola MC6800 is not the same as a language for another CPU. – Tsakiroglou Fotis Mar 04 '20 at 16:12
  • 1
    not only are they different but you will see different assembly languages for the same architecture – old_timer Mar 05 '20 at 06:57

3 Answers3

4

There are broad similarities among most assemblers. It's always line-oriented, like

[label:]  mnemonic [operand list]

although a few assemblers use spaces instead of commas to separate operands.

And some historical assemblers distinguish label vs. mnemonic based on starting column instead of via a : after label names. (So they enforce good style: labels at the far left, mnemonics indented) A label defines a symbol name to refer to that position in the output. (In many assemblers, a non-mnemonic on a line by itself is also treated as a label, even without a :)

Some syntaxes put the destination operand last, many others put it first, but as far as the basic grammar of parsing lines into tokens, that's a semantic issue not syntactic.

A few assemblers with significantly different syntax exist, like x86 HLA where instructions look like C function calls.

The macro processor built-in to most assemblers differs significantly between assemblers. Directive names like .long vs. dd vs. dword.

Classic MIPS assembler has a .align directive that brings previous labels with it, instead of just emitting padding at the current location. (And without .set noreorder, the assembler will actually optimize your code to fill branch-delay slots.) Again that's not syntactic, but is a big semantic difference in what .align means.

Other than that, it's pretty much universal that each line of asm assembles to 0 or more bytes of output in some section, independent of surrounding lines.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • First you say there are broad similarities. Then you go on to show how all assemblers are completely different in the syntax they accept... – TonyK Mar 04 '20 at 17:43
  • 2
    @TonyK: I'm saying there's one main general syntax that most use (regardless of semantic differences), and a few assemblers that do something different even syntactically. When I look at asm for an ISA I've never seen before, it's always pretty obvious what's a mnemonic and what the (explicit) operands are. I don't always know if the operands are register names or some special immediate modifiers for that mnemonic, but in terms of just syntax it's rarely surprising. – Peter Cordes Mar 04 '20 at 18:22
4

My question is, even though the instruction set would be different, is the assembly syntax (or the language it self) the same for any architecture?

No!

Just for x86, there are a dozen different assemblers, each having their own uniqueness making them each accept a slightly different language — there's GAS, MASM, NASM, TASM, FASM, ASM... Few programs will assemble with all of these x86 assemblers.

There's at&t syntax vs. intel — target first vs. target last.

There's varied requirements around directives: .proc, .endp, etc..

There's Intel's beautiful byte ptr syntax for determining operation size/width, vs. most of the rest of the world's .b, .w, .l opcode suffixes (sometimes without the .).

Some assemblers like the : after label, others don't allow it (or require a , instead).

Some require special characters to differentiate register names from other identifiers (e.g. % prefix for some, $ prefix for others), others don't.

Syntax for addressing modes also vary significantly, e.g. in ARM's [] notation, the unusual location of the constant after the brackets indicates pointer variable update.

And that's without getting into the names of the opcodes.

On intel we use call for the instruction that invokes a function (transfers pc to function while capturing return address), jal on MIPS & RISC V, bsr, jsr, or bl, jms on others, etc..

The term for invoking system calls, variously syscall, ecall, trap, sc, int, swi, svc etc..

In short, there's no standardization of language, grammar, or syntax across assemblers.


As for similarities, broadly speaking, there's the concepts of if-goto conditional branching (and unconditional branching) as the mechanism for control flow constructs, the concept of labels as branch targets and data targets, one instruction per line (as @Peter mentions), mnemonic opcode with separate operands — but these similarities are conceptual rather than syntactic.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
0

There is a term like high-level assembler https://en.wikipedia.org/wiki/High-level_assembler. However now there is no sense in using it, since as this page says:

High-level assemblers typically provide instructions that directly assemble one-to-one into low-level machine code as in any assembler

Different architectures gives usually different features like, conditional instructions, which cannot be mapped to other assembly.

If you need to create portable code, use C language. It gives you a lot of possibilities to create low-level programs. If you need to use specific architecture feature, you can use inline assembler (in GCC it's extended ASM).

Nabuchodonozor
  • 704
  • 1
  • 6
  • 13
  • i am not trying to create portable code. I understand that assembly is specific to the architecture you are coding for. I was just curious as i read tutorial on assembly for x86 and arm. I notice that they are similar but different registers commands etc. so i was wondering if that logic was consistent across platforms. so if decided to try my hand at assembly for some other device that is not arm/x86 etc. i would have to learn the new commands but the syntax of the assembly file ie .data and macros etc would still be the same. – eleethesontai Mar 04 '20 at 16:27
  • once you learn one the next one is easier and the next easier. instruction sets often have simlar operations, add, subtract, and, or, xor, read a memory location, write a memory location, etc. if you are trying to look at it one way then it is just a matter of syntax. but the syntax will vary across isas and can/will vary within an isa across different tools – old_timer Mar 05 '20 at 06:59