2

I am sort of a newbie to assembly language, and I need help understanding how mnemonics are converted directly to bytes.

For example, I have a line saying

b 0x00002B78

which is located at the memory address 0x00002A44. How does this translate to EA00004B (the byte representation of the above assembly)? I am under the impression that the "EA00" signifies the "b" branching part of the assembly, but what about the "004B"? If anyone can give a general understanding of this and resources to find conversions and such, that would be appreciated. I tried googling this but I am really not to sure what to google exactly. The stuff I have been googling has not been helpful.

Sidd Singal
  • 559
  • 6
  • 16
  • @PaulSullivan I sort of figured that too, but how does the 004B actually correlate between 0x2A44 and 0x2B78? – Sidd Singal Aug 20 '13 at 16:58
  • 1
    At the time the branch is taken, `PC` will be 8 bytes ahead of the current instruction due to prefetching. So take the destination address (0x2B78), subtract 0x2A44+8 (0x2A4C) and you get 0x12C. Divide by 4 (since the offset stored in the instruction will by shifted left 2 bits) and you end up with 0x4B. – Michael Aug 20 '13 at 17:08
  • @Michael Thanks for that bit about "prefetching" I was wondering why PC would be 8 bytes ahead – Sidd Singal Aug 20 '13 at 17:14

1 Answers1

8

All the information you're looking for is in the ARM Architecture Reference Manual. If you look up the b instruction, you'll see its encoding and how it works. Here's the specific instruction you care about:

excerpt from ARM docs

The E is the condition field, which you can look up in this table:

condition fields

For you, it's "execute always". Then the A, which in binary is the 1010 to match bits 27:24 (you have a branch instruction, not a branch & link instruction). Lastly the rest of the instruction is the immediate offset field. It's a PC-relative offset, which is why it's encoded as 0x00004b.

Let's look at your specific example now. You have the instruction:

b 0x00002B78

located at address 0x00002a44. OK, great. So first off, we can stick in the opcode bits:

cccc 101L xxxx xxxx xxxx xxxx xxxx xxxx

Now, the L bit is zero for our case:

cccc 1010 xxxx xxxx xxxx xxxx xxxx xxxx

We want to execute this instruction unconditionally, so we add the AL condition code bits:

1110 1010 xxxx xxxx xxxx xxxx xxxx xxxx

And now all we have to do is calculate the offset. The PC will be 0x2a4c when this instruction is executed (the PC is always "current instruction + 8" in ARM), so our relative jump needs to be:

0x2b78 - 0x2a4c = 0x12c

Great - now we apply the reverse of the transformations described in the documentation above, rightshifting 0x12c by two:

0x12c / 4 = 0x4b = 0b1001011

And that's the last field:

1110 1010 0000 0000 0000 0000 0100 1011

Turning that binary instruction back into hex gives you the instruction encoding you were looking for:

0xea00004b
Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 1
    I was about to comment with another question before you just edited, but that's an extremely thorough and helpful response! Thank you so much for writing this up (and thank you for the link). – Sidd Singal Aug 20 '13 at 17:08
  • Happy to help; let me know if there's anything I can clarify further. – Carl Norum Aug 20 '13 at 17:11