4

I'm trying to write some special routine in assembly, for x86-64 (even x86 example is fine). The problem: my immediates are only resolved at link time.

For example,

addq $Label2-Label1, %rax

will add the difference between the two labels/symbols to rax. Unfortunately, because GNU Assembler only does one pass (I don't know why, even Open Source assemblers like FASM do multiple passes), it cannot resolve those so it will make the linker do it.

Unfortunately, it will reserve 4 bytes (32-bit immediate) for the linker, which is not what I want, because the difference in labels is always within -128 to +127 range.

My question, how do I force or specify that the instruction should have an 8-bit immediate? Like, what's the syntax here? In AT&T syntax or Intel, either is fine. For example, in NASM you do:

add rax, byte Label2-Label1

to specify 8bit immediate. But how to do this in GAS? What's the syntax to force it to use 8-bit immediate even if it doesn't know the immediate itself... I'd ideally want this in GAS, for specific reasons, so please don't tell me to use NASM as an answer!

EDIT: I'm sorry I forgot to mention, that yes, it is a "forward reference" here. Both labels are defined after the instruction, that is why GAS can't resolve them, I thought it's a relocation, but yes using '.byte label2-label1' works for sure as I have tested it, so I know it should be possible if it had some syntax for it...

René Nyffenegger
  • 39,402
  • 33
  • 158
  • 293
kktsuri
  • 333
  • 2
  • 11
  • Do those 3 extra bytes make that much of a difference? – Drew McGowen Aug 20 '14 at 21:59
  • Well, if it is not supposed to make a difference then just use a C compiler. Instruction length is one of the major bottlenecks on x86_64. – Hans Passant Aug 20 '14 at 22:18
  • Well I thought there was a simple syntax I am missing, and the documentation has absolutely zero info about instruction sets and syntaxes (other than "comparing" it to Intel), or atleast I could not find anything. But again, for example, NASM can do this with 'byte', and uses Intel syntax, so I was wondering what would GAS's syntax be (it doesn't work with 'byte' even if I use Intel syntax in it...) And this kind of instruction is part of some bigger table computing offsets though, repeated many times. – kktsuri Aug 21 '14 at 22:53

2 Answers2

4

If Label1 and Label2 are in the same source file then the problem doesn't seem to be related to the linker (GAS doesn't generate any relocations in that case) nor is it really due to GAS being a one pass assembler. It's smart enough to generate correct sized branches, which is a similar problem.

The problem comes down to GAS not being smart enough to choose the right sized instruction in cases other than jumps and branches, and not having any way to explicitly specify the size of the operand. The ".d8" displacement suffix is almost the syntax you want, but you're technically not using a displacement. But it wouldn't work anyways, as leal.d8 Label2-Label1(%eax),%eax doesn't work, despite the fact a displacement is actually being used.

So that leaves really only one alternative, hand assembling the opcode. For example in 32-bit assembly:

.byte 0x83, 0xC0, (label2 - label1)

As for why GAS is only a one pass assembler, and in general doesn't do a number of things other assemblers, the answer is simple. It's primarily designed to assemble the output of GCC.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • If the assembler is a one pass assembler, and if label1 and label2 are past the instruction that performs the add, then it's stuck with forward references, and I'm not sure how a one pass assembler would handle this. – rcgldr Aug 21 '14 at 01:28
  • I suppose it depends how you define "one pass assembler". GAS defines itself as being one pass, and manages to determine the distance of a forward branch and generate the smallest sized instruction correctly. – Ross Ridge Aug 21 '14 at 01:34
  • The terminology I'm used to for a "two pass assembler" is that the first pass generates a symbol table and addresses/offsets for the symbols (labels), then the second pass generates data and code using the symbol table as needed. A "multi-pass" assembler could do things like reducing size of forward reference instructions that have variable size, with a small reduction in code size on each pass (for the optimized forward references). – rcgldr Aug 21 '14 at 03:53
  • You'll have to take it up with the GAS maintainers if you think their using the terminology incorrectly. – Ross Ridge Aug 21 '14 at 05:15
  • The terminology is OK, a single pass assembler will make default assumptions about forward references, and place additional data at the end of an object file about references that need to be filled in by the linker. In this case the additional data needs to handle the case of label minus label. I'm not sure what the advantage of using a single pass assembler is. – rcgldr Aug 21 '14 at 06:46
  • The GNU assembler fully resolves forward references itself without having to rely on the linker. That's what I meant by GAS not generating relocations in my answer. Those are the additional bits of data assemblers put in object files to get the linker to fix up things they can't resolve themselves. GAS is able to resolve `label2 - label1` itself even if the two labels are defined later. – Ross Ridge Aug 21 '14 at 14:19
  • So what the wiki article on [single pass assembler](http://en.wikipedia.org/wiki/Assembly_language#Number_of_passes) calls errata, is handled by the assembler as a post pass processing phase to fill in forward reference information (as opposed to external reference) in an object file? – rcgldr Aug 21 '14 at 15:47
  • Presumably, but I don't know the details of GAS's implementation. – Ross Ridge Aug 21 '14 at 15:52
  • I forgot to mention, that yes indeed, they are forward references which is why GAS cannot resolve them I guess? I tried to put an example with labels before the instruction, and it worked ok. Your solution with hard-coding the instruction works but it is really something I'd want as a last resort only. It is kind of dumb it does not have such a syntax as an assembler, sigh. Sadly it looks like you may be right. I will accept your answer tomorrow, thank you, I'm hoping someone with some intricate knowledge might know of an easier hack or something so I give it a day or so. – kktsuri Aug 21 '14 at 22:58
  • Well, it's not that GAS isn't able to resolve the value of `Label2 - Label1` when forward referenced. It's just that it does so after its chosen the instruction encoding for the `addq` instruction. With branches GAS is smart enough to delay choosing the branch encoding of forward branches so it can pick the smallest encoding when it knows far the branch will jump. It's just not smart enough to do this with all instructions. Presumably because while GCC generates a lot of branch instructions, it never does anything like you're trying to do. – Ross Ridge Aug 21 '14 at 23:26
  • I see, thanks for the clarification I understand now. Still wish it had a way in syntax to use a specific instruction/modifier (in this case, the sign-extended imm8 is a different instruction than imm32). – kktsuri Aug 22 '14 at 22:22
1

Unfortunately, this is not possible; because GAS doesn't figure out what the difference should be, it has no choice but to leave a 4-byte relocation entry for the linker to resolve. This is because the object file format (which is more than likely ELF) doesn't support 8-bit relocations - it only supports 32-bit relocations. Thus, it will always use a 32-bit immediate.

Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
  • But if I use '.byte label2-label1' it works as only a byte. I know that GAS can't know what the difference should be because it is one pass, if the labels are forward reference. However, in my lookup code I know that the difference will be within range of sign-extended 8-bit immediate. That is why I am asking for a way to "force" GAS to use 8-bit immediate (and i.e give warning or error if out of range, but that's not needed). Like I said it works with .byte, I thought I was missing some syntax though, but I guess it may not be possible sadly. – kktsuri Aug 21 '14 at 23:01
  • If it works with `.byte`, then you might be able to get away with manually encoding the form of the instruction you want, i.e. just assemble a dummy file with the instruction with an immediate of 0, then copy all the bytes except the last, then manually put the `.byte` directive right after. – Drew McGowen Aug 22 '14 at 00:51