1

I'm focusing on a snippet of ARM Assembly where add command it is used. The snippet, see below, simply states: to the address of the program counter add the offset calculated to find the position of the string stored at L._str, where L._str is the symbol (the address) of a string contained in the data segment.

movw    r0, :lower16:(L_.str-(LPC1_0+4))
movt    r0, :upper16:(L_.str-(LPC1_0+4))
LPC1_0:
    add r0, pc

The first two instructions (movw and movt) load the 32-bit number representing the address of that string. I'm in Thumb mode, right? Ok, so said this, I've difficulties on how to figure out the overall memory layout. Does the following is the right representation of the code segment of the memory? In addition, are LPC1_0 and L._str the base addresses of add r0, pc the address of A simple string string? What is the dimension of each box? 32 bit or 64 bit depending on the architecture.

--------------------------------------------
| movw    r0, :lower16:(L_.str-(LPC1_0+4)) |
--------------------------------------------
| movt    r0, :upper16:(L_.str-(LPC1_0+4)) |
-------------------------------------------- LPC1_0
| add r0, pc                               |
--------------------------------------------
                       .
                       .
                       .
-------------------------------------------- L._str
| "A simple string"                        |
--------------------------------------------

If so, I can just retrieve the offset (that will be add to the pc) using the difference L_.str-LPC1_0. But, here +4 also is taken into account.

From ADD, pc or sp relative

ADD Rd, Rp, #expr

If Rp is the pc, the value used is: (the address of the current instruction + 4) AND &FFFFFFFC.

So, it appears that if the pc is the Rp I need to take into account also +4 more bytes for the offset offset. Ok. so, where are these bytes added? Why these 4 bytes are taken into account into mov instructions and not before the add command? Is this a optimization features introduced by the compiler?

Community
  • 1
  • 1
Lorenzo B
  • 33,216
  • 24
  • 116
  • 190
  • This has nothing to do with the stack - that is for runtime data, parameter passing, etc. - the code and data segments relevant here are a different thing entirely. Also, the add instruction here is [the register form](http://infocenter.arm.com/help/topic/com.arm.doc.dui0170b/BABCGJFF.html), not the pc-relative immediate form you've linked to - chances are the movw/movt/add sequence was emitted specifically because the offset is (or might be) beyond the range of that encoding. – Notlikethat Feb 23 '15 at 10:31
  • @Notlikethat Did you vote for closing the question? If so, let me know what is not clear to you in order to improve my question. Then, about your comment. You are saying that *This has nothing to do with the stack [...] the code and data segments relevant here are a different thing entirely.* So, what are they relevant for? – Lorenzo B Feb 23 '15 at 15:47
  • Finally, I'm not an expert of ARM. So, could you explain me the second part of the comment? Thank you. – Lorenzo B Feb 23 '15 at 15:48
  • This looks like a 'jump' to the string address. Some context would help. Is this an exploit? If it is normal compiler generation, then I guess this is a shared library and this is some sort of 'fix-up'. You don't really have an address to the string, but a 'plt/glt' fix-up which will patch with a 'run-time' address. You gave a lot of low level detail without high-level explanation. *focusing on ARM assembler*; from where that does what? Answer those to make a better question. – artless noise Feb 23 '15 at 16:17
  • Not me, I was just trying to help by clearing up some confusion - this is about computing an offset from the code segment to the data segment, both of which have layouts that are fixed at compile/link time. The stack is one of the areas for dynamic runtime data and isn't involved in any way here, so the mentions of it make no sense. – Notlikethat Feb 23 '15 at 16:36

2 Answers2

1

My educated guess:

You want to get the "absolute" address where L_.str is in memory. movw and movt seem to add immediate values, so the value is inside the opcode.

The compiler calculates the offset between LPC1_0 and L_.str, and substracts another 4 (bytes).

the add r0,pc instructions adds pc+4 to that value.

the +4 are added by the processor. I think it is because the pc is incremented quite early in the processors "logic", and the add only can read the value of pc afterwards. It's simpler to document that it is really pc+4 than to add extra logic to add pc+4-4 by the processor...

The advantage of that whole solution to calculate the address of L_.str is that its independent of relocation of that code.

Lorenzo B
  • 33,216
  • 24
  • 116
  • 190
DThought
  • 1,340
  • 7
  • 18
1

The normal position-independent "get the address of something" instruction would be simply adr, r0, L._str (which is equivalent to having the assembler/linker automatically calculate an appropriate offset for add r0, pc, #offset). However, since the ARM architecture uses fixed-width encodings - ARM instructions are 32 bits wide, Thumb instructions are either 16 or 32 bits - there are only a limited number of bits of the instruction available to encode the immediate value for the offset, so the maximum range is limited. The maximum possible offset that a Thumb encoding of adr can support is +/-4095 bytes. Since the compiler has no idea how far apart the linker will put the sections, it can't safely emit adr for risk of the final offset being too big to assemble, so instead you get the 3-instruction generate immediate/add PC sequence. The advantage is that it can reach any 32-bit address, the tradeoff is that it takes up more space in the program image and instruction cache - adr alone is 2 or 4 bytes (depending on the offset and target register), the movw/movt/add sequence weighs in at 10 bytes and takes at least twice as long to execute.

As for why the PC offset is folded into the section offset, well, why wouldn't it be? Both are constant, so when the linker is calculating the distance between LPC1_0 and L_.str in the final image to encode the immediate value into the movw/movt instructions, it has absolutely nothing to gain by not adding the PC correction at the same time. That's why the 2-instruction fetch/execute offset of the original ARM's 3-stage pipeline was exposed in the first place, because it was considerably simpler to fix up addresses in the assembler/linker when building software, than to implement all the logic to "correct" it in hardware.

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • I understand what you mean with your prev comments. In particular, I fixed the following *Does the following is the right representation of the **code segment** of the stack memory?* sentence. I removed *stack* since was my fault. Thanks. – Lorenzo B Feb 24 '15 at 09:35