1

I'm learning about assembly language right now and I'm a bit confused about how the immediate values are encoded. Can someone explain why the following values are valid: 0xff00ff00, 0xffffffff, 0x007f8000? Also why are the values 0xff0000ff, 0x007f9000 invalid?

From my understanding, the 12 bit immediate is split into 4 upper bits of rotation and 8 lower bits of the constant. So I thought all of the values I listed above would be invalid because it would need more than 12 bits.

Some clarification on this topic would help so much, thanks!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    The assembler can also use `mvn` with inverted bit patterns, for example; I think there's an existing Q&A about that somewhere. Also, if this is ARM64, then repeating bit-patterns are possible with bitwise booleans (but not immediates for other instructions like `add`), and thus for a pseudo-instruction that puts an immediate in a register. (Maybe also with some Thumb instructions in 32-bit ARM, I forget?) Have you tried looking at disassembly to see what encoding your assembler actually picked? The opcode might give you a big clue. – Peter Cordes Oct 14 '21 at 14:06
  • @PeterCordes i tried looking at the opcode but I am still confused – ahahahahana Oct 14 '21 at 14:25
  • 1
    Please clarify whether this is ARM32 or ARM64. Also, what's the actual instruction here? Different instructions allow different sets of immediates. – Nate Eldredge Oct 14 '21 at 14:50
  • 0xff00ff00 is definitely invalid. – Jake 'Alquimista' LEE Oct 14 '21 at 14:59
  • 1
    @Jake'Alquimista'LEE In Thumb code it is valid. OP didn't say whether he was using ARM or Thumb mode. – fuz Oct 14 '21 at 15:01
  • @fuz holy Sh*t. I completely forgot about thumb. No wonder the OP is so confused. Honestly, I don't understand professors teaching thumb assembly. You can hardly optimize anything in thumb assembly. – Jake 'Alquimista' LEE Oct 14 '21 at 15:17
  • @Jake'Alquimista'LEE I don't quite understand what you mean. You can do largely the same things in Thumb you can do in ARM. And the A32 (ARM) encoding is slowly becoming obsolete in favour of T32 (Thumb). – fuz Oct 14 '21 at 15:22
  • 1
    @fuz you are talking about thumb2 which is a slightly different story. The lack of conditional execution is however the deal breaker anyway. And besides, there are many professors still forcing their students to write hello world in original thumb. – Jake 'Alquimista' LEE Oct 14 '21 at 16:11
  • 1
    @Jake Thumb(2) supports conditional execution. And yes, of course Thumb2 is meant when I say Thumb. – fuz Oct 14 '21 at 16:47
  • @fuz I don't consider `IT` blocks a proper conditional execution - way too restrictive. – Jake 'Alquimista' LEE Oct 15 '21 at 03:34
  • @Jake'Alquimista'LEE `IT` blocks with just one instruction in them are parsed as instruction prefixes these days and do not cost extra cycles to execute. I.e. they do the exact same thing as conditional execution in A32 mode just with a potentially longer encoding. Not sure what you are missing. – fuz Oct 15 '21 at 08:45
  • @fuz only up to 5 consecutive instructions in the block if I remember correctly. I see no reason for using `thumb2` when there is `ARM32` mode where I can freely schedule all the instructions without the restrictive block. And the professors don't teach `thumb2` but the original `thumb` – Jake 'Alquimista' LEE Oct 16 '21 at 05:23
  • @Jake'Alquimista'LEE Blocks with more than one instruction in them are deprecated and single-instruction blocks behave like prefixes, i.e. they are decoded as a part of the following instruction. It's really quite identical to ARM mode. – fuz Oct 16 '21 at 09:10

2 Answers2

3

(This answer is for ARM32 mode, not Thumb2 or AArch64. Things are different there, and allowed immediates can depend on the instruction.)

You must be talking about the 12bit encoding. It actually is 4 + 8 bit encoding. 4 for the position, 8 for the pattern, so the rotate count has to be even.

  1. any value from 0 to 255 is valid. 0x00 ~ 0xff pattern at the position 0
  2. 256 and any two power of N is valid. Since they are all 1bit pattern.
  3. 257 isn't valid since 0x101 requires a 9 bit pattern
  4. 258 isn't valid since its position is odd even though the pattern fits into 8bits. (129<<1)
  5. 260 is valid (65<<2)

And there are instruction such as mvn, cmn, etc that makes it hard to tell if a number is valid as an immediate value if your instruction is mov or cmp, or another one that has a version which does something to an immediate before using it.

PS: 2^4 = 16, and the register is 32bit. That's why the position has to be even.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jake 'Alquimista' LEE
  • 6,197
  • 2
  • 17
  • 25
2
.thumb

ldr r0,=0xFF00FF00

0:  f04f 20ff   mov.w   r0, #4278255360 ; 0xff00ff00


.thumb
.cpu cortex-m0

ldr r0,=0xFF00FF00

00000000 <.text>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <.text+0x4>)
   2:   0000        .short  0x0000
   4:   ff00ff00    .word   0xff00ff00

Look at the ARM documentation it clearly documents how the immediate encodings work. And also basically what you cannot do. Various thumb2 extensions add more features as shown above (armv6-m vs armv7-m (or -a)).

As Jake points out the 32 bit arm instructions are basically 8 significant bits shifted by an even number (0,2,4,6).

ldr r0,=0x00000081
ldr r0,=0x00000101
ldr r0,=0x00000102
ldr r0,=0x00000204
ldr r0,=0x10000008
ldr r0,=0xEFFFFFF7
ldr r0,=0xFFFFF00F


00000000 <.text>:
   0:   e3a00081    mov r0, #129    ; 0x81
   4:   e59f0010    ldr r0, [pc, #16]   ; 1c <.text+0x1c>
   8:   e59f0010    ldr r0, [pc, #16]   ; 20 <.text+0x20>
   c:   e3a00f81    mov r0, #516    ; 0x204
  10:   e3a00281    mov r0, #268435464  ; 0x10000008
  14:   e3e00281    mvn r0, #268435464  ; 0x10000008
  18:   e3e00eff    mvn r0, #4080   ; 0xff0
  1c:   00000101    .word   0x00000101
  20:   00000102    .word   0x00000102

The arm encodings are easier to understand than the thumb encodings, but the arm docs have examples that make it easier.

Since you mentioned 0xFF00FF00 this means you are asking about armv7-a or armv7-m yes?

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • You will want the armv7-m architectural reference manual. – old_timer Oct 18 '21 at 03:42
  • all thumb variants thumb (armv4t through armv7- a and the cortex-ms) is very boring, almost no immediates supported. A few bits here and there. – old_timer Oct 18 '21 at 03:44
  • "shifted by an even power of 2". I think you mean "multiplied by", otherwise just say "shifted by an even count", otherwise you're exponentiating twice. (i.e. shift by 4, 16, 64, ... would be the even powers of two.) – Peter Cordes Oct 18 '21 at 04:17
  • yep, wrote that wrong, forgot to fix it – old_timer Oct 18 '21 at 17:51
  • i need to use valid cortex m4 constants – ahahahahana Oct 19 '21 at 05:30
  • then armv7-m architectural reference manual. shows exactly how it works. if you use gnu assembler and use the ldr rx,=address syntax then the assembler will take care of it for you as shown above, if it fits it will generate the instruction if not it will generate a pc relative load from a local pool that it creates – old_timer Oct 19 '21 at 12:42
  • Is this instruction substitution (`ldr` to `mov` or `mvn`) enforced by the spec or it is assembler implementation specific? – Ilya Loskutov Mar 11 '23 at 11:41