1

In Intel(R) 64 and IA-32 Architectures Software Developer's Manual Combined Volumes: 1, 2A, 2B, 2C 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 Order Number: 325462-080US June 2023.

2.7.5 Compressed Displacement (disp8*N) Support in EVEX For memory addressing using disp8 form, EVEX-encoded instructions always use a compressed displacement scheme by multiplying disp8 in conjunction with a scaling factor N that is determined based on the vector length, the value of EVEX.b bit (embedded broadcast) and the input element size of the instruction.

I want to know "the input element size of the instruction" exactly.

There are two questions about "the input element size" :

  1. What is the exact definition of "input element size" ?
  2. Is there a general law to know the input element size for the instruction in each row in the instruction tables in the Intel PDF Manual ?

I'm now making an x86_64 Assembler (Compiler) with SSE/AVX/AVX2/AVX512. So, if I mistook the input size incorrectly, it would fail to encode a displacement value in a disp8 byte.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
YutakaAoki
  • 87
  • 5
  • 2
    **Disp8*N** scaling factor depends on **EVEX.L'.L.b** bits and the *tuple* defined by Intel for each instruction, see my [macro](https://euroassembler.eu/easource/ii.htm#IiDisp8EVEX). When the element size divided by scaling factor doesn't fit into signed 8bit integer, you'll need to encode displacement as **disp32**. – vitsoft Jul 15 '23 at 08:12
  • @vitsoft: Thank you for your comment. But I don't know about your macro yet. – YutakaAoki Jul 15 '23 at 20:11
  • [This example](https://euroassembler.eu/eatests/t5150.htm) shows when **disp8*N** is not applicable. Whenever **disp8*32** is used, it is signalized in the hexadump with `<5` decoration, otherwise the instruction is encoded with `disp32`. – vitsoft Jul 16 '23 at 06:03

1 Answers1

1

The "input element size of the instruction" refers to the size of the data being processed by the instruction in the Intel(R) 64 and IA-32 Architectures Software Developer's Manual.It varies depending on the specific instruction and the type of data being operated on.

For example, in SIMD instructions, it represents the size of the elements processed in parallel. To know the exact input element size, you need to refer to the specific instruction's documentation or the related section in the manual. The size is typically specified in terms of bits or bytes, depending on the architecture. You can check the instruction's documentation for accurate information about the input element size.

algorythms
  • 1,547
  • 1
  • 15
  • 28
  • Is there a general law to know the input element size for the instruction in each row in the instruction tables in the Intel PDF Manual ? – YutakaAoki Jul 15 '23 at 11:44
  • 1
    @Peter Cordes: By the way, the TupleType Column of the instruction without an EVEX prefix is marked "N/A". That means it does NOT support a Compressed Displacement (disp8*N), right ? So factor N=1, and if a disp8 is one, the displacement is one, isn't it ? – YutakaAoki Jul 15 '23 at 17:32
  • The last operand of the instruction supporting broadcast is such as xmm3/m128/m32bcst. I think the 32bit-sized memory "m32" imply that the element size is 32 bit. so N = 32(bit) / 8 = 4(byte), correct ? – YutakaAoki Jul 15 '23 at 17:47
  • 1
    @YutakaAoki: Right, scaled disp8 was new with AVX-512's EVEX encoding, since unrolling with 64-byte vectors easily exceeds the limits of -128..+127 and needs a disp32 with only a few vector-widths. (It would have been helpful sometimes with SSE and AVX, but only with AVX-512 EVEX did Intel actually make addressing-mode encoding/decoding more complicated to do something about the problem. Hence the N/A tuple type for non-EVEX) So yes, legacy-SSE and VEX encodings always use unscaled disp8. – Peter Cordes Jul 15 '23 at 17:51
  • @Peter Cordes: Thanks. Additionally, about the last operand such as xmm3/m128/m32bcst, I think the 32bit-sized memory "m32" implies that the "input element size" is 32 bit. so N = 32(bit) / 8 = 4(byte), correct ? – YutakaAoki Jul 15 '23 at 17:56
  • 1
    @YutakaAoki: Yes, `m32bcst` means it supports stuff like `vaddps ymm0, ymm1, dword [rdi]{1to8}`, with a 4-byte memory operand that's broadcasted. And this is also the disp8 scale factor, like I said in my first comment. – Peter Cordes Jul 15 '23 at 18:15
  • 1
    @YutakaAoki: Correction to my first comment (which I've deleted): supporting `m32bcst` doesn't mean the scale factor is `4`. That's only the case when you actually *use* a broadcast source operand. If you look at NASM's machine code for `vaddps ymm16, ymm1, [rdi+64]`, the disp8 is `2`: two full vectors, since the tuple type is "full". `[rdi+60]` has to use a `disp32` since it's not a multiple of 32, unless you use `[rdi+60]{1to8}`. Same for `vpackssdw` with the same operands. – Peter Cordes Jul 16 '23 at 05:02
  • 1
    (The disp8 is the last byte of the machine code when there's no immediate operand, so it's very easy to find in the `objdump -drwC -Mintel` disassembly when checking what existing known-good assemblers do.) – Peter Cordes Jul 16 '23 at 05:07