How to use ins instruction with GNU assembler

Question

How do I use the x86 ins instruction with the GNU assembler? The instruction reference suggests the syntax INS m8/16/32, DX where e.g. m16 (I assume) is any 16 bit general purpose register whose only purpose it is to signify whether a byte/word/doubleword should be read, right?

Now, unfortunately, as rejects ins %ax,%dx with Error: operand type mismatch for 'ins', why is that?

For the record, I know I could simply use insb etc. but I'm calling this instruction via inline assembly inside a C++ program and the size of the input to be read depends on a template parameter (and string handling at compile time is not very practical).

EDIT: here is what I have now, for reference (I don't really like the macro)

#define INS(T) \
  __asm__ __volatile__("repne \n\t" \
                       "ins" T \
                       : "=D" (dest), "=c" (count) \
                       : "d" (port), "0" (dest), "1" (count) \
                       : "memory", "cc")

template<typename T>
void ins(uint16_t port, uint32_t dest, uint32_t count);

template<>
void ins<uint8_t>(uint16_t port, uint32_t dest, uint32_t count)
{ INS("b"); }

template<>
void ins<uint16_t>(uint16_t port, uint32_t dest, uint32_t count)
{ INS("w"); }

template<>
void ins<uint32_t>(uint16_t port, uint32_t dest, uint32_t count)
{ INS("l"); }

It's supposed to be a *memory* reference, not a register. The idea in Intel syntax is that you could write `ins dword ptr [rdi], dx` for a 32-bit read (aka `insd`), `ins word ptr [rdi], dx` for a 16-bit `insw` read, etc. You could maybe even write `ins dword ptr [foo], dx` and get a 32-bit read but the data would be written to `[rdi]` anyway. Nonetheless AT&T assembler syntax doesn't support this at all and the only way to specify the size is with the operand size suffix. — Nate Eldredge, Nov 21 '20 at 17:23
You *might* be able to hack it by switching the compiler to use Intel syntax instead, but I think it's going to be painful. I would try instead to figure out how to just choose the right one of `insb / insw / insl` at compile time. I don't see why you would need string handling; just write all three and use template specialization or `switch` or whatever to execute the correct one. — Nate Eldredge, Nov 21 '20 at 17:25
(Don't forget to be careful with your operands, constraints and clobbers; the instruction modifies `rdi`. And you know that you can't execute these instructions in unprivileged code, unless you've done something such as `iopl()`? I'd assume you are writing kernel or bare-metal embedded code, only it would be a bit unusual for either of those to be C++.) — Nate Eldredge, Nov 21 '20 at 17:27
@NateEldredge: Ah okay, I understand. I will have to do with some code duplication then I guess. I think I took care of the constraints correctly, I'll check what compiler outputs, and yes I'm writing a kernel. — Peter, Nov 21 '20 at 18:00
Actually, I think it can be done with operand modifiers. Writing an answer... — Nate Eldredge, Nov 21 '20 at 18:10

Nate Eldredge · Accepted Answer · 2020-11-21T18:58:41.240

5

It's supposed to be a memory reference, not a register. The idea in Intel syntax is that you could write ins dword ptr [rdi], dx for a 32-bit read (aka insd), ins word ptr [rdi], dx for a 16-bit insw read, etc. You could maybe even write ins dword ptr [foo], dx and get a 32-bit read but the data would be written to [rdi] anyway. Nonetheless AT&T assembler syntax doesn't support this at all and the only way to specify the size is with the operand size suffix.

In GCC inline assembly, you can get an operand size suffix b,w,l,q matching the size of an operand (based on its type) with the z operand modifier. See the GCC manual, section "Extended Asm" under "x86 operand modifiers". So if you use types appropriately and add an operand which refers to the actual destination (not the pointer to it), you can do the following:

template<typename T>
void ins(uint16_t port, T *dest, uint32_t count) {
    asm volatile("rep ins%z2"
        : "+D" (dest), "+c" (count), "=m" (*dest)
        : "d" (port)
        : "memory");
}

Try it on godbolt

It's important here that the destination be a T * instead of a generic uint32_t since the size is inferred from the type T.

I've also replaced your duplicated input and output operands with the + read-write constraint. And to nitpick, the "cc" clobber is unnecessary because rep ins doesn't affect any flags, but it's redundant on x86 because every inline asm is assumed to clobber the flags anyway.

edited Nov 21 '20 at 18:58

answered Nov 21 '20 at 18:29

Nate Eldredge

48,811
6
54
82

I am the upvote, but I believe this `asm` statement should be `volatile` given that it relies on the state of what is on an x86 port and that may differ from run to run. You don't want the compiler to optimize away port reads it may think are redundant. – Michael Petch Nov 21 '20 at 18:40
@MichaelPetch: Does volatile make a difference when output operands are specified? I hadn't even though about that and just added the volatile out of habit. – Peter Nov 21 '20 at 18:43
@Peter If there are no output `=` and no input/output `+` operands then ASM statements are implicitly volatile and specifying volatile is not required. In this case there are input/output operands so if you need it to be volatile you need it to be specified. When using x86 ports and Memory Mapped IO (MMIO) you need to specify volatile to let the code know that there are external changes the compiler isn't aware of that may not produce the same results each time an operation is done. – Michael Petch Nov 21 '20 at 18:45
@Peter: Yes, if the compiler thinks the output operand is not used afterwards, it can optimize the `asm` away. I couldn't get it to happen for this example, but here's a more trivial one: https://godbolt.org/z/E75qGe – Nate Eldredge Nov 21 '20 at 18:47
@NateEldredge: This solution is indeed much nicer than mine. I also didn't know about `+`. I actually made an error in my question, that should be `rep` not `repne`, I copied that from somwhere else but it doesn't really make sense now that I think about it. – Peter Nov 21 '20 at 18:52
On a side note I noticed CLANG has a problem with the inline assembly. I mention it, but this is a G++ question so its not really directly relevant. – Michael Petch Nov 21 '20 at 18:57
@MichaelPetch: Interesting, that seems like a clang bug. It supports `%zN` properly when an explicit type is given, but apparently not when it comes through a template. – Nate Eldredge Nov 21 '20 at 19:04
I am under the firm belief this isn't the first time we have encountered this bug (or one related to it) with inline assembly and constraints not working within a template in C++ (and it works fine outside a template). I recall in the last 2-3 years there was some other issue that would have been related to this and I think someone generated a bug report (against llvm/c++). I can't find it, but I am sure it exists. – Michael Petch Nov 21 '20 at 21:20
1

@MichaelPetch: I found [bug 4348](https://bugs.llvm.org/show_bug.cgi?id=4348) which is similar. – Nate Eldredge Nov 21 '20 at 21:22

Michael Petch · Answer 2 · 2020-11-21T20:25:35.250

This answer is in response to the first version of the question that was asked prior to the major edit by the OP. This addresses the AT&T syntax of GNU assembler for the INS instruction.

The instruction set reference for INS/INSB/INSW/INSD — Input from Port to String shows that there are really only 3 forms of the INS instruction. One that takes a byte(B), word(W), or double word(D). Ultimately a BYTE(b), WORD(w), or DWORD(l) is read from the port in DX and written to or ES:RDI, ES:EDI, ES:DI. There is no form of the INS instructions that take a register as a destination, unlike IN that can write to AL/AX/EAX.

Note: with the IN instruction the port is considered the source and is the first parameter in AT&T syntax where the format is instruction src, dest.

In GNU assembler it is easiest to simply use either of these 3 forms:

insb      # Read BYTE from port in DX to [RDI] or ES:[EDI] or ES:[DI]
insw      # Read WORD from port in DX to [RDI] or ES:[EDI] or ES:[DI]
insl      # Read DWORD from port in DX to [RDI] or ES:[EDI] or ES:[DI]

In 16-bit code these instruction would do:

insb      # Read BYTE from port in DX to ES:[DI]
insw      # Read WORD from port in DX to ES:[DI]
insl      # Read DWORD from port in DX to ES:[DI]

In 32-bit code these instruction would do:

insb      # Read BYTE from port in DX to ES:[EDI]
insw      # Read WORD from port in DX to ES:[EDI]
insl      # Read DWORD from port in DX to ES:[EDI]

In 64-bit code these instruction would do:

insb      # Read BYTE from port in DX to [RDI]
insw      # Read WORD from port in DX to [RDI]
insl      # Read DWORD from port in DX to [RDI]

Why do assemblers support the long form as well? Mainly for documentation purposes but there is a subtle change that can be expressed with the long form and that is the size of the memory address (not the size of the data to move). In GNU assembler this is supported in 16-bit code:

insb (%dx),%es:(%di)      # also applies to INSW and INSL
insb (%dx),%es:(%edi)     # also applies to INSW and INSL

16-bit code can use the 16-bit or 32-bit registers to form the memory operand and this is how you can override it (there is another using addr overrides described below). In 32-bit code it is possible to do this:

insb (%dx),%es:(%di)      # also applies to INSW and INSL
insb (%dx),%es:(%edi)     # also applies to INSW and INSL

It is possible to use the 16-bit registers in a memory operand in 32-bit code. There are very few use cases where this is particularly useful but the processor supports it. In 64-bit code you are allowed to use 32-bit or 64-bit registers in a memory operand, so in 64-bit code this is possible:

insb (%dx),%es:(%rdi)      # also applies to INSW and INSL
insb (%dx),%es:(%edi)      # also applies to INSW and INSL

There is a shorter way in GNU assembler to change the memory address size and that is using INSB/INSW/INSL with the addr16, addr32, and addr64 overrides. As an example in 16-bit code these are equivalent:

addr32 insb               # Memory address is %es:(%edi). also applies to INSW and INSL
insb (%dx),%es:(%edi)     # Same as above

In 32-bit code these are equivalent:

addr16 insb               # Memory address is %es:(%di). also applies to INSW and INSL
insb (%dx),%es:(%di)      # Same as above

In 64-bit code these are equivalent:

addr32 insb               # Memory address is %es:(%edi). also applies to INSW and INSL
insb (%dx),%es:(%edi)     # Same as above

How to use ins instruction with GNU assembler

2 Answers2

Linked