Moving memcpy into another code section

Question

I am building a piece of software meant to run on an ARM Cortex-M0+ microcontroller. It includes a USB bootloader of sorts that runs as a secondary program upon a call to a function. I'm having an issue with the insertion of the memcpy function during compilation.

Background

The linker script is where it all starts. Most of it is pretty straightforward and standard. The program is stored in .text and is executed from there as well. Everything in .text is stored in the flash section of the chip.

The strangeness is the part where the bootloader runs. In order to be able to write all of the flash without overwriting the bootloader code, my bootloader entry point initiates a copy of the bootloader program into the SRAM portion of the microcontroller and then executes it from there. This way, the bootloader can safely erase all of the flash on the device without inadverently deleting itself.

This is implemented by doing an faked "overlay" in the linker script (the real OVERLAY didn't quite match my use case):

/**
 * The bootloader and general ram live in the same area of memory
 * NOTE: The bootloader gets its own special RAM space and it lives on top
 * of both .data and .bss.
 */

_shared_start = .;
.bootloader _shared_start : AT(_end_flash)
{
    /* We keep the bootloader and its data together */
    _start_bootloader_flash = LOADADDR(.bootloader);
    _start_bootloader = .;
    *(.bootloader.data)
    *(.bootloader.data.*)
    . = ALIGN(1024); /* Interrupt vector tables must be aligned to a 1024-byte boundary */
    *(.bootloader.interrupt_vector_table)
    *(.bootloader)
    _end_bootloader = .;
}

.data _shared_start : AT(_end_flash + SIZEOF(.bootloader))
{
    _start_data_flash = LOADADDR(.data);
    _start_data = .;
    *(.data)
    *(.data.*)
    *(.shdata)
    _end_data = .;
}
. = _shared_start + SIZEOF (.data);
_bootloader_size = _end_bootloader - _start_bootloader;
_data_size = _end_data - _start_data;

_end_flash is a reference to the end of the previous section which stored all of its data in flash (.text, .rodata, .init...basically anything read-only gets stuck there).

What this accomplishes is that the .data and .bss sections normally live in RAM. However, the .bootloader sections also live in the same place in RAM. Both sections are stored to the flash sequentially when compiled. In my crt0 routines, the .data section is copied from the flash into its appropriate address in RAM (specified by _start_data) and the .bss section is zeroed. I have an additional section stored in the .text section which initiates the bootloader by copying its data from the flash into RAM, overwriting whatever was in .data and .bss. The only exit from the bootloader is a system reset, so it is ok that it destroys the data for the running program. After copying the bootloader into RAM, it executes it.

The Question

Obviously, there are some possible issues with compiling an overlaid program and making sure all the references line up. In order to mitigate issues that would crop up accessing bootloader code from the normal program or accessing the normal .data or .bss from the bootloader, I have the following three lines in my linker script:

NOCROSSREFS(.bootloader .text);
NOCROSSREFS(.bootloader .data);
NOCROSSREFS(.bootloader .bss);

Now, whenever I have a cross between the .text (which might be erased by the bootloader), .data (which the bootloader lives on top of), or .bss (again, the bootloader lives on top of it) and the .bootloader section, a compiler error will be issued.

This worked great until I actually started writing code. Part of my code includes some struct copying and other such things. Apparently, the compiler decided to do this (bootloader_ functions live in the .bootloader section):

20000340 <bootloader_usb_endp0_handler>:
...
20000398:   1c11        adds    r1, r2, #0
2000039a:   1c1a        adds    r2, r3, #0
2000039c:   f000 f8e0   bl  20000560 <__memcpy_veneer>
...
20000560 <__memcpy_veneer>:
20000560:   b401        push    {r0}
20000562:   4802        ldr r0, [pc, #8]    ; (2000056c <__memcpy_veneer+0xc>)
20000564:   4684        mov ip, r0
20000566:   bc01        pop {r0}
20000568:   4760        bx  ip
2000056a:   bf00        nop
2000056c:   00000869    andeq   r0, r0, r9, ror #16

In my chip's architecture, addresses 0x20000000 until 0xE000000 or so are located in SRAM (I only have 4Kb of that actually on the device). Any address below 0x1fffffc00 is located in the flash section.

The problem is this: In my function located in my .bootloader section (bootloader_usb_endp0_handler), a reference to memcpy (2000039c, 20000562, and 2000056c) was inserted because I'm doing a struct copy among other things. The reference it put to memcpy is at address 0x00000869, which lives in the flash...which could be erased.

The particular code is:

static setup_t last_setup;
last_setup = *((setup_t*)(bdt->addr));

Where setup_t is a two-word struct and bdt->addr is a void* which I know points to data that looks like a setup_t. This line generates the call to memcpy.

My question is: I'd really like to keep my struct copying. It is convenient. Is there any way to specify to the compiler to place the memcpy into a specific section other than the default? I want that to happen just for the bootloader module. All the other code can have it's memcpy...I just want a special copy for my bootloader module that lives inside .bootloader.

If this simply isn't possible, I'm going to either write the entire bootloader in assembly (not as fun) or go the route of compiling the bootloader separately, including it as a fairly long hexadecimal string in the end program, and executing the string after copying it to RAM. The string route doesn't appeal to me very well because it is breakable and difficult to implement...so any other suggestions would also be appreciated.

The compilation line for this module is:

arm-none-eabi-gcc -Wall -fno-common -mthumb -mcpu=cortex-m0plus -ffreestanding -fno-builtin -nodefaultlibs -nostdlib -O0 -c src/bootloader.c -o obj/bootloader.o

Normally the optimization would be -Os, but I was trying to get rid of the memcpy...it didn't work.

Also, I've looked at this question and it didn't fix the problem.

I don't actually call `memcpy`...gcc inserts it when I do a struct copy, so I'm not sure writing it as a macro would help. I edited my post with the culprit lines of code. — Los Frijoles, Jan 23 '15 at 07:24
Sorry I missed that part. So do you supply a memcpy implementation? Isn't it strange that gcc still inserts memcpy operations with -fno-builtin flag? — auselen, Jan 23 '15 at 07:28
`arm-none-eabi-gcc` seems to have some memcpy implementation that it inserts into my code. It is indeed weird that the `-fno-builtin` still creates the reference. — Los Frijoles, Jan 23 '15 at 07:29
I wonder if this might be considered as a bug on gcc. Which version you are using? What happens if you add your own memcpy implementation? — auselen, Jan 23 '15 at 07:32
I'm not so familiar with M0+, but I have questions: Is it mandatory to run bootloader in RAM? With M3 and M4 it isn't, that make you able to place bootloader at first pages and never erase it. Why are you developing bootloader and application in the same firmware? Could you make them as two different firmware: bootloader starting from 0 and application starting from 0+bootloader_size+spares. This means that you can link memcopy in both firmware. — LPs, Jan 23 '15 at 07:37
It is not mandatory. I decided against simply reserving the first block as the bootloader and having two separate programs because I wanted to actually be able to update the bootloader using the bootloader itself and not hooking up some hacked together SWD cable to my device whenever I needed to update it. I do agree, however, that that method is much cleaner and would solve all of my issues (and make my linker script much less convoluted). I might end up going that route anyway. This is almost more of an experiment. — Los Frijoles, Jan 23 '15 at 07:39
@LosFrijoles It's possible to go around that limitation by creating an application that contains a bootloader, and when run just writes that new bootloader over the old one. That way you can do a complete update in two steps: first flash such a "boot replacement application", run that, then flash the actual desired new application using the newly-replaced boot loader. — unwind, Jan 23 '15 at 08:20
If you use `struct` assignment the compiler has to implement it somehow. One reasonable way to do this is to always use `memcpy` for that. I wouldn't consider this to be a bug. And `-fno-builtin` is probably just the wrong thing to do, you *want* builtin `memcpy` implemented directly in the code and not the *extern* `memcpy` as provided by the library. — Jens Gustedt, Jan 23 '15 at 08:38

score 0 · Answer 1 · answered Jan 23 '15 at 07:39

I never tried, but you might get away using the EXTERN() linker script directive to force load your newlib memcpy() twice - first in the bootloader link stage into your desired section and later undefining it and link it a second time into your "normal" code.

Moving memcpy into another code section

Background

The Question

1 Answers1