5

First some background. When firmware for whatever reason crashes (e.g. stack overflow, corrupted function pointer...) it may happen, that it jumps somewhere and starts executing some code. This will sooner or later result in watchdog reset. MCU will reset and we are back on track. Unless...

What about when we have code that writes to flash (e.g. bootloader)? Now it can happen that we will accidentally jump directly into the flash write code - skipping all the checks. Before watchdog will bark you will end up with corrupted firmware. This is exactly what was happening to me.

Now some might say - fix the root bug that caused that we even jumped into write code. Well, when you are developing you are constantly changing the code. Even if there is no such bug in there at the moment, there might be tomorrow. Besides, no code is bug free - or at least not mine.

So now I am doing some kind of cross checking. I have a variable named 'wen' which I set to 0xa5 before the usual checks (e.g. check to make sure that destination is valid). Then just before doing the actual erase or write I check if 'wen' is really set to 0xa5. Otherwise this means that we somehow accidentally jumped into the writing code. After successful write 'wen' is cleared. I have done this in C and it worked well. But there is still slight theoretical chance corruption will happen, cause there are few instructions from this final check of 'wen' till write to SPMCR register.

Now I want to improve this by putting this check into assembly, between the write to SPMCR and spm instruction.

__asm__ __volatile__
(   
    "lds __zero_reg__, %0\n\t"
    "out %1, %2\n\t"
    "ldi r25, %3\n\t"
    "add __zero_reg__, r25\n\t"
    "brne spm_fail\n\t"
    "spm\n\t"
    "rjmp spm_done\n\t"
    "spm_fail: clr __zero_reg__\n\t"
    "call __assert\n\t"
    "spm_done:"
    :
    : "i" ((uint16_t)(&wen)),
      "I" (_SFR_IO_ADDR(__SPM_REG)),
      "r" ((uint8_t)(__BOOT_PAGE_ERASE)),
      "M" ((uint8_t)(-ACK)),
      "z" ((uint16_t)(adr))
   : "r25"
);

Haven't tried the code yet, will do that tomorrow. Do you see any problems? How do/would you solve such problem?

Stefan
  • 326
  • 2
  • 4

2 Answers2

3

One technique I've seen is to make sure the bytes immediately before your flash write routines will trigger some sort of watchdog timeout, or reset the processor. That way, it's not possible to execute random data leading up to the flash write function and just "fall into" the function.

You may need to have some NOPs before your reset to ensure the instructions are interpreted correctly.

Your technique of verifying that the function has run from the beginning, looks like a good one, assuming you clear the wen variable once you've done the write.

tomlogic
  • 11,489
  • 3
  • 33
  • 59
  • Yes, wen is cleared after successful write. This call to __assert actually triggers watchdog reset (plus it logs some info about what triggered it). Glad to hear that people actually use such approaches :) – Stefan Feb 17 '12 at 07:01
2

Im not sure why you need to have the capability to write to flash in your bootloader. Our bootloader does, because it can update the application program via serial port. So we eliminate the potential for inadvertent writes by ensuring that the loader does not contain any of the code that writes to flash. That code is downloaded is a header in the same package that contains the image to be written. The onboard image has the checksum of the programming algo stored, and verifies it before running it.

If you are writing things that are generated internally, then I would look at hardware related interlocks. Only allow writes if you have previously set a particular discrete output pin to ON. To answer the problem of "what if the IP jumps past the checks"? you can do it in 2 parts. First set some critical variables for the algorithm. (the address to write to for example- keep that initialized to invalid memory, and only set it correctly in a separate call made before the write. Then have the write function check your HW interlock. Do one of the enable steps in an interrupt, or in response to a timer, something that is unlikely to be hit in the correct sequence if you have a rogue IP.

If your IP can really jump anywhere , it may be impossible to prevent an inadvertent write. The best you can hope for is that you ensure the only path to get there also sets up everything else needed for a successful write.

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • I'm doing the same thing (update via UART). There is write code in application and in bootloader. So they can cross-update each other, and I also store some configuration in flash. I don't know what uC you have, but the one I have cannot execute code from RAM, so uploading a code for writing is not an option. Actually I am using similar procedure as you described with HW interlock. This is what I tried to explain when I set the question... looks like I did a bad job :) – Stefan Feb 17 '12 at 18:09
  • @Stefan: be careful with updating your bootloader from your application. What happens on power loss between the erase and write? Oops, your bootloader is gone and you've bricked your device. One a Freescale HCS08 project, the bootloader actually sets CPU registers to protect it from being overwritten, and it's impossible for the application to write to those pages in the flash. Be safe. – tomlogic Feb 17 '12 at 18:49
  • @tomlogic: I have 2 write codes - in aplication and in bootloader - and they can cross update each other. uC by default wakes into application code. If I fail to update the bootloader I still wake into application and can retry. And the bootloader uses a trick. It writes the application code from the highest page to the lowest. Before it writes the first (which is actually the highest) page, it writes 'jump to bootloader' into the page 0. Now the only concern is that it should not fail to write the very last page (page 0). So chances are extremely small to end up with non-recoverable flash. – Stefan Feb 18 '12 at 00:21
  • @tomlogic: ... The reason that I need writing to flash from the application code is actually not to update the bootloader but that I have some configuration stored in flash which I want to update from application (I found it much easier to use flash than EEPROM). So the part of the application code that writes to flash has to be in "bootloader" part of flash. Now, if I lock the bootloader, this actually means I locked part of the application. I see you really have experience with such things, so I would be glad to hear what you think, since I have no one to discus such things. – Stefan Feb 18 '12 at 00:44