How does the RMI Instruction Operand Encoding of ROUNDSS work?

Question

A few x86 instructions like ROUNDSS require this seemingly obscure instruction operand encoding, on which I can't find any documentation or definition in Intel's Software Developer's Manual.

How are the bits of this encoding used? I put 66 0f 3a 0b c0 0c (roundsd xmm0,xmm0,0xc ) into a dissembler and varied the bits to gain a better understanding, but could only access half the XMM registers.

I'm also unclear on the meaning of

128-bit Legacy SSE version: The first source operand and the destination operand are the same.

as e. g. 66 0f 3a 0b c1 0c is disassembled without warning/error to roundsd xmm0,xmm1,0xc.

In `roundsd xmm0,xmm1,0xc`, the high bytes to merge the result into are read from `xmm0`, and then the unchanged high half with the round result in the low half are written back to `xmm0`. The VEX version would allow reading a different merge source. Same as with other scalar one-input XMM instructions like `sqrtss` that are unfortunately designed with a false dependency on the destination, instead of zero-extending. — Peter Cordes, Jan 02 '22 at 04:28
Understood, so reg1 != reg2 is valid, thanks! Btw, also thanks for the additional tags! — soc, Jan 02 '22 at 05:33
With the legacy encoding, you need a REX prefix to access the other half of the registers (the new VEX prefix has those bits built in) — harold, Jan 02 '22 at 07:49
This is not really obscure. RMI is also used for the three operand form of `imul` for example. And that dates back to the 386. It's just reg+r/m+imm. I.e. RM but there's also an immediate. — fuz, Jan 02 '22 at 12:23
If I understand this correctly, this means the second-to-last byte encodes two operands, 3 bits for the first, 3 bits for the second ... and two bits for something else (mode?)? — soc, Jan 02 '22 at 23:06
@soc Correct. This is called a *modr/m byte*. It's the same as with any other instruction where the instruction encoding is given with a `/r` or `/0` to `/7` element. Refer to the Intel Software Development Manual for how the modr/m and SIB bytes work. — fuz, Jan 04 '22 at 12:49
@soc Is there anything missing with my answer? If yes, please let me know so I can amend it with the bits you are looking for. — fuz, Jan 14 '22 at 14:29

fuz · Accepted Answer · 2022-01-04T13:15:53.243

Legacy Encoding

The encoding is as follows:

66 0F 3A 0A /r ib

The opcode is 0A in the 0F 3A opcode plane. A mandatory 66 prefix must be supplied. The opcode is followed by a modr/m byte (/r) encoding the first operand in the R field and the second operand in the R/M field. The instruction is followed by an 8 bit immediate (ib) encoding the third operand.

Let's encode for example

roundss xmm8, [rdx+r9*8+64], 0xc

We have xmm8 and r9 as “upper” registers, so a REX.RX prefix 46 must be supplied to hold the extra bits.

The modr/m byte is 44 indicating an 8 bit displacement (mod = 01), presence of a SIB byte (r/m = 100), and xmm8 as a reg operand (reg = 000, REX.R set).

The SIB byte is CA indicating rdx as the base (base = 010), r9 as the index (index = 001, REX.X set) and a scale of 8 (scale = 11).

Then follows the displacement 40 (i.e. decimal 64).

Finally, we have the 8 bit immediate 0c.

These bits are then assembled in the order legacy prefixes, REX prefix, opcode plane prefix, opcode, modr/m byte, SIB byte, displacement, immediate. So the entire instruction comes out as

66 46 0F 3A 0A 44 CA 40 0C
|  |  |     |  |  |  |  \... immediate
|  |  |     |  |  |  \...... displacement
|  |  |     |  |  \......... SIB byte
|  |  |     |  \............ modr/m byte
|  |  |     \............... opcode
|  |  \..................... opcode plane prefix
|  \........................ REX prefix
\........................... mandatory prefix

VEX Encoding

128-bit Legacy SSE version: The first source operand and the destination operand are the same.

The VEX encoded variant of the instruction vroundss has an additional source operand. Instead of this operand, the legacy encoded version reads this source operand from the destination operand.

If we want to encode this instruction as the VEX-encoded variant

vroundss xmm8, xmm2, [rdx+r9*8+64], 0xc

instead, we start with a VEX prefix. This prefix subsumes mandatory prefix, REX prefix, and opcode plane prefix into one 3 byte prefix. This prefix has the form:

11000100 RXBmmmmm WvvvvLpp
R, X, B: complemented REX prefix bits
W: REX.W prefix bit (not complemented)
m: opcode plane (1: `0F`, 2: `0F 38`, 3: `0F 3A`)
L: vector length (0: 128 bit, 1: 256 bit)
p: mandatory prefix (0: none, 1: `66`, 2: `F3`, 3: `F2`)
v: complemented extra source register number

A shorter 2 byte VEX prefix

11000101 RvvvvLpp

can be used when REX.X, REX.B, and REX.W are clear and m = 00000. This is not the case here. The encoding is given as

VEX.LIG.66.0F3A.WIG 0A /r ib VROUNDSS xmm1, xmm2, xmm3/m32, imm8

indicating that the L and W fields are ignored, there is a mandatory 66 prefix and the opcode is 0A in the 0F 3A opcode plane, followed by modr/m operands and a byte immediate. The first and third operands are encoded in the modr/m byte, the second operand is the additional operand encoded by the VEX prefix.

So we have for our instruction

R = 0, indicating presence of REX.R
X = 0, indicating presence of REX.X
B = 1, indicating absence  of REX.B
W = 0, indicating absence  of REX.W (ignored)
L = 0, indicating a 128 bit operand size (ignored)
m = 00010, indicating the 0F 3A opcode plane
p = 01, indicating a 66 mandatory prefix
v = 1101, indicating xmm2 as a first operand

Giving the VEX prefix C4 22 69. The rest is identical to the legacy encoding, giving a full instruction

C4 23 69 0A 44 CA 40 0C
|        |  |  |  |  \... immediate
|        |  |  |  \...... displacement
|        |  |  \......... SIB byte
|        |  \............ modr/m byte
|        \............... opcode
\........................ VEX prefix

How does the RMI Instruction Operand Encoding of ROUNDSS work?

1 Answers1

Legacy Encoding

VEX Encoding