1

So I had a simple ARM assembly (specifically THUMB) program being compiled for a TI Microcontroller. I'm just confused as to where EQU and DCD are stored in memory (RAM vs ROM) and how the AREA directive relates to that. I started off with this:

Y1      EQU     0x23


        AREA    |.text|, CODE, READONLY, ALIGN=2
        THUMB

X2      DCD     0x23
Y2      EQU     0x23

    MOV R0, #0
    LDR R1, =X2
    STR R0, [R1]

        END

I assumed that since EQU's are constant, they go in ROM. But here, they are in the CODE section which is READONLY (so I'm assuming that goes in ROM) and in a section that has no AREA directive. I'm not sure what the default is there.

DCD was declared in a READONLY section, yet I'm still allowed to write to it.

If I add a DCD to the empty section I get an error: Area directive missing. If I add the AREA directive then the code looks like this:

    AREA    |.data|, DATA

X1      DCD     0x23
Y1      EQU     0x23


        AREA    |.text|, CODE, READONLY, ALIGN=2
        THUMB
        EXPORT  Start

X2      DCD     0x23
Y2      EQU     0x23

Start
    MOV R0, #0
    LDR R1, =X1
    STR R0, [R1]
    MOV R0, #0
    LDR R1, =X2
    STR R0, [R1]

        END

EQUs and DCDs are everywhere and the AREA directives don't seem to affect how I can access them at all. Also, adding READONLY to the AREA DATA directive also has no effect.

rcplusplus
  • 2,767
  • 5
  • 29
  • 43
  • The EQU directive doesn't allocate any memory and so it doesn't "go" anywhere. It just assigns a value to a symbol. The DCD directive allocates memory in current section for one or more 32-bit words, initializes the words with the values given and then assigns the address of the first word allocated to the symbol. The AREA directive has no direct effect on whether anything goes in ROM or RAM. That depends how the linker has been configured to map sections into ROM or RAM. The assembler won't give an error if you write to a READONLY section, and the CPU will likely ignore any to writes to ROM. – Ross Ridge Apr 13 '17 at 01:53
  • assembly language is specific to the assembler, the software you are using. What assembler/software are you using here? ARM is not a sufficient answer as that is not a software program but the company that makes many cpu cores. – old_timer Apr 13 '17 at 02:22
  • as already stated EQU is just like a define in C it has no storage it is just an ascii substitute solution within the program, a search and replace that happens before assembling. DCD does allocate space, but it is within the section, you can and need data items in .data and .text. In .text you need for example addresses that cannot be loaded as an immediate. Tricks to get around that as you demonstrated (ldr r1,=x2) which allocates a 32 bit data item in the .text space for that address. LINKING determines where .text and .data are, they could both be in ram if you want... – old_timer Apr 13 '17 at 02:24
  • @old_timer I'm using Keil uVision. Also, if EQU is just a substitution in the program, then why can I write `LDR R0, =SomeEQULabel`? What does it replace the label with? It can't put the whole immediate value as it's 32 bits, and LDR doesn't take an immediate value anyways – rcplusplus Apr 13 '17 at 02:32
  • using other assemblers you absolutely can use ldr r0,=0x12345678. What do you think a label is? an address, a 32 bit address so ldr r0,=someaddress is just saying when you figure out what the address is I want it in that register so allocate a location for me and replace this with a pc relative load. – old_timer Apr 13 '17 at 02:35
  • what you need to be doing is assembling then disassembling to see what is going on... – old_timer Apr 13 '17 at 02:37
  • @old_timer If THUMB has only 16 bit instructions, how can you have a 32 bit address as an immediate? True, we have a disassembly output window. I could probably take a look in there. – rcplusplus Apr 13 '17 at 02:37
  • what does the size of th einstruciton have to do with it, the 64 bit x86 processors use 8 bit opcodes. variable instruction length in units of 8 bits. – old_timer Apr 13 '17 at 02:38
  • Aren't immediate values part of the instructions? Like for example the last n bits in a 32 bit instruction are dedicated to storing the immediate values for say a MOV instruction. – rcplusplus Apr 13 '17 at 02:39
  • again assemble and dissassemble, there is a pc relative load in thumb as well as arm. it has nothing to do with the size of the instruction, it says take the pc add some offset read the 8, 16, 32 or 64 bit value and place it in this one or these two registers. the ldm is a 16 bit instruction and it can read up to 8 registers – old_timer Apr 13 '17 at 02:39
  • no that is the whole point, the immediates in a fixed-ish instruction set like mips and arm are very limited, so when you cannot fit th eimmediate you use a pc relative load, you put the value in ram/rom and you ask for a load from ram rom into the register, the immediate offset will fit in the pc relative load instruction. just assemble and disassemble...all your questions will be answered. – old_timer Apr 13 '17 at 02:40
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/141601/discussion-between-rcplusplus-and-old-timer). – rcplusplus Apr 13 '17 at 03:01

1 Answers1

1

Using an assembler I have access to, the questions you are asking should port between the two assembly languages as a number of the questions are about the instruction set not the assembly language.

.equ X1,0x12345678

.text
.thumb

.globl _start
_start:


ldr r0,=X1
ldr r1,=X2
ldr r2,[r1]
ldr r3,=Y4
ldr r4,=Y3
str r3,[r4]
bl bounce
mov lr,pc
ldr r5,=bounce
bx r5
b .
X2: .word 0xAABBCCDD

.thumb_func
bounce:
    bx lr
    nop

.data

Y3: .word 0
Y4: .word 0x11223344

assemble link and disassemble.

00001000 <_start>:
    1000:   4807        ldr r0, [pc, #28]   ; (1020 <bounce+0x4>)
    1002:   4908        ldr r1, [pc, #32]   ; (1024 <bounce+0x8>)
    1004:   680a        ldr r2, [r1, #0]
    1006:   4b08        ldr r3, [pc, #32]   ; (1028 <bounce+0xc>)
    1008:   4c08        ldr r4, [pc, #32]   ; (102c <bounce+0x10>)
    100a:   6023        str r3, [r4, #0]
    100c:   f000 f806   bl  101c <bounce>
    1010:   46fe        mov lr, pc
    1012:   4d07        ldr r5, [pc, #28]   ; (1030 <bounce+0x14>)
    1014:   4728        bx  r5
    1016:   e7fe        b.n 1016 <_start+0x16>

00001018 <X2>:
    1018:   aabbccdd    bge feef4394 <X1+0xecbaed1c>

0000101c <bounce>:
    101c:   4770        bx  lr
    101e:   46c0        nop         ; (mov r8, r8)
    1020:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000
    1024:   00001018    andeq   r1, r0, r8, lsl r0
    1028:   00002004    andeq   r2, r0, r4
    102c:   00002000    andeq   r2, r0, r0
    1030:   0000101d    andeq   r1, r0, sp, lsl r0

Disassembly of section .data:

00002000 <__data_start>:
    2000:   00000000    andeq   r0, r0, r0

00002004 <Y4>:
    2004:   11223344            ; <UNDEFINED> instruction: 0x11223344

Disassembly of section .ARM.attributes:

00000000 <.ARM.attributes>:
   0:   00001341    andeq   r1, r0, r1, asr #6
   4:   61656100    cmnvs   r5, r0, lsl #2
   8:   01006962    tsteq   r0, r2, ror #18
   c:   00000009    andeq   r0, r0, r9
  10:   01090206    tsteq   r9, r6, lsl #4

so it took the ldr r0,=0x12345678 and turned that into this

 1000:  4807        ldr r0, [pc, #28]   ; (1020 <bounce+0x4>)

and this

    1020:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

the load (32 bit because it is an ldr, ldrb would be 8 (padded), ldrh 16 bits (padded)) takes the pc which is two instructions ahead, adds 28 to that 0x1000 + 4 + 28 = 0x1000 + 32 = 0x1000 + 0x20 so at that address they placed the data 0x12345678. Same goes for all the other =somethings...

I could have done that myself though and not relied on a pseudo instruction.

.text
.thumb

.globl _start
_start:

    ldr r0,xyz
    ldr r1,xyz_add
    ldr r2,[r1]
    b .

xyz: .word 0x12345678
xyz_add: .word xyz

unlinked is good enough

00000000 <_start>:
   0:   4801        ldr r0, [pc, #4]    ; (8 <xyz>)
   2:   4902        ldr r1, [pc, #8]    ; (c <xyz_add>)
   4:   680a        ldr r2, [r1, #0]
   6:   e7fe        b.n 6 <_start+0x6>

00000008 <xyz>:
   8:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

0000000c <xyz_add>:
   c:   00000008    andeq   r0, r0, r8

because I have it in the same section, nearby I can load the 0x12345678 directly I dont need to get the address then load from the address basically what the =0x12345678 pseudocode does. but for far away things you can still place a data item to be the address then load that then load from that (double indirect).

.text
.thumb

.globl _start
_start:

    ldr r0,=0x11223344
    ldr r1,=5

at least with one assembler you can use the =something trick for everything and the assembler will hopefully optimize if it fits.

00000000 <_start>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <_start+0x4>)
   2:   2105        movs    r1, #5
   4:   11223344

would have been nice if it went the other way and if you did a mov immediate it would do the load pc relative if it doesnt fit, but I dont think they do that.

Now translate that to your assembly language. The AREA declaration declares .text and .data the linker later defines where those are. some linkers can modify the code more than just an immediate offset, some can replace the whole instruction at times (to trampoline off some linker inserted code as needed). In this case the linker is going to fill in the addresses to things in the assembler allocated data locations in the sections.

you can have data items in .text as well as .data the .text data items are read only things be it const like tables or addresses to things in other linked in code or sections. things the linker has to fill in the remote addresses to as they are not resolved at assemble time.

EQU is historically the assembly language version of a simple define in C

#define ABCD 0x12345678

and before compiling a pass is done to search and replace instances of ABCD with 0x12345678. Same goes with the assembler. Unlike C you might not be able to do more than just a search and replace, assembler macros are different syntax. but it is define-like.

DCD, DCB, etc are like .word, .byte in gnu assembler, they say I want to put some raw data here or allocate space for raw data here, not instructions but data for whatever reason I want to use it.

One would hope that if the assembler has a READONLY directive that it honors it, if it isnt that would bother me. But at the same time the well used names .text, .data might trump that.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • And if after the search and replace, the literal value doesn't fit within the instruction, it does a PC-relative load, right (like in your example of `ldr r0,=0x11223344`)? – rcplusplus Apr 13 '17 at 03:45
  • certainly for gnu assembler. I dont know about the assembly language you are using...just try it assemble and disassemble and see what it does. or go the other way and do a mov r0,#SOMETHING and EQU something to a simple and not simple constant and see. – old_timer Apr 13 '17 at 03:49
  • Note the ARM immediate encoding is easy to wrap your head around like 8 or 9 non-zero bits that can be shifted/rotated in two bit units, so 0xAB000000 or 0xB000000A and such. The thumb immediate encoding is much harder to wrap your head around, clearly a number like 1 or 3 they can handle, but others you have to go look. – old_timer Apr 13 '17 at 03:52
  • Yeah I looked at the disassembly and it seems that large EQU values (like a 32 bit value) is DCD'd into a memory location and then PC-relative addressing is used to get it. – rcplusplus Apr 13 '17 at 04:11