Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, and for Java bytecode, use the tag java-bytecode-asm instead.

Assembly is a family of very low-level programming languages, just above machine code. In assembly, each statement corresponds to a single machine code instruction. These instructions are represented as mnemonics in the given assembly language and are converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code.

Language design

Basic elements

There is a large degree of diversity in the way that assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of three types of instruction statements that are used to define program operations:

  • Opcode mnemonics
  • Data sections
  • Assembly directives

Opcode mnemonics and extended mnemonics

Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand. For example, the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0.

Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions

Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b. These are sometimes known as pseudo-opcodes.

Tag use

Use the tag for assembly language programming questions, on any processor. You should also use a tag for your processor or instruction set architecture (, , , , , etc). Consider a tag for your assembler as well (, , , et cetera).

If your question is about inline assembly in C or other programming languages, see . For questions about .NET assemblies, use instead and for .NET's Common Intermediate Language, use . For Java ASM, use the tag .

Resources

Beginner's resources

Assembly language tutorials, guides, and reference material

43242 questions
280
votes
6 answers

What is exactly the base pointer and stack pointer? To what do they point?

Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(), (Note that this diagram has high addresses at the bottom and low addresses at the top.) Could anyone explain me what ebp and esp are in this context? From what I see,…
devoured elysium
  • 101,373
  • 131
  • 340
  • 557
279
votes
10 answers

Assembly code vs Machine code vs Object code?

What is the difference between object code, machine code and assembly code? Can you give a visual example of their difference?
mmcdole
  • 91,488
  • 60
  • 186
  • 222
278
votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
268
votes
5 answers

Why does GCC use multiplication by a strange number in implementing integer division?

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include #include int main() { size_t i = 9; size_t j = i / 5; …
qiubit
  • 4,708
  • 6
  • 23
  • 37
233
votes
4 answers

Why would introducing useless MOV instructions speed up a tight loop in x86_64 assembly?

Background: While optimizing some Pascal code with embedded assembly language, I noticed an unnecessary MOV instruction, and removed it. To my surprise, removing the un-necessary instruction caused my program to slow down. I found that adding…
tangentstorm
  • 7,183
  • 2
  • 29
  • 38
227
votes
8 answers

Show current assembly instruction in GDB

I'm doing some assembly-level debugging in GDB. Is there a way to get GDB to show me the current assembly instruction in the same way that it shows the current source line? The default output after every command looks like this: 0x0001433f 990 …
JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
225
votes
25 answers

Protecting executable from reverse engineering?

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or…
graphitemaster
  • 3,483
  • 4
  • 19
  • 14
216
votes
32 answers

Why aren't programs written in Assembly more often?

It seems to be a mainstream opinion that assembly programming takes longer and is more difficult to program in than a higher level language such as C. Therefore it seems to be recommend or assumed that it is better to write in a higher level…
mudgen
  • 7,213
  • 11
  • 46
  • 46
212
votes
5 answers

The point of test %eax %eax

Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? I'm very very new to assembly language programming, and I'm currently trying to read the assembly language generated from a binary. I've run across test %eax,%eax or test %rdi,…
pauliwago
  • 6,373
  • 11
  • 42
  • 52
201
votes
21 answers

Is inline assembly language slower than native C++ code?

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code: #define TIMES 100000 void calcuC(int *x,int *y,int length) { for(int i = 0; i…
user957121
  • 2,946
  • 4
  • 24
  • 36
194
votes
4 answers

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux: http://www.int80h.org/bsdasm/#system-calls http://www.freebsd.org/doc/en/books/developers-handbook/x86-system-calls.html But what are the x86-64 system call…
claws
  • 52,236
  • 58
  • 146
  • 195
190
votes
3 answers

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(int i) { int mantissa, exponent, sign, r; …
orlp
  • 112,504
  • 36
  • 218
  • 315
183
votes
13 answers

Can num++ be atomic for 'int num'?

In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here): void f() { int num = 0; num++; } f(): push rbp …
Leo Heinsaar
  • 3,887
  • 3
  • 15
  • 35
183
votes
1 answer

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
balajimc55
  • 2,192
  • 2
  • 13
  • 15
181
votes
12 answers

What is the difference between MOV and LEA?

I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR]
naveen
  • 53,448
  • 46
  • 161
  • 251