0

I am trying to fix an bug found in a mature program for Fujitsu MB90F543. The program works for nearly 10 years so far, but it was discovered, that under some special circumstances it fails to do two things at it's very beginning. One of them is crucial.

After low and high level initialization (ports, pins, peripherials, IRQ handlers) configuration data is read over SPI from EEPROM and status LEDs are turned on for a moment (to turn them a data is send over SPI to a LED driver). When those special circumstances occur first and only first function invoking just a few EEPROM reads fails and additionally a few of the LEDs that should, don't turn on.

The program is written in C and compiled using Softune v30L32. Surprisingly it is sufficient to add single __asm(" NOP ") in low level hardware init to make the program work as expected under mentioned circumstances. It is sufficient to turn off 'Control optimization of pointer aliasing' in Optimization settings. Adding just a few lines of code in various places helps too.

I have compared (DIFFed) ASM listings of compiled program for a version with and without __asm(" NOP ") and with both aforementioned optimizer settings and they all look just fine.

The only warning Softune compiler has been printing for years during compilation is as follows:

*** W1372L: The section is placed outside the RAM area or the I/O area (IOXTND)

I do realize it's rather general question, but maybe someone who has a bigger picture will be able to point out possible cause.

Have you got an idea what may cause such a weird behaviour? How to locate the bug and fix it?

During the initialization a few long (about 20ms) delay loops are used. They don't help although they were increased from about 2ms, yet single NOP in any line of the hardware initialization function and even before or after the function helps.

Both the wait loops works. I have checked it using an oscilloscope. (I have added LED turn on before and off after).

I have checked timming hypothesis by slowing down SPI clock from 1MHz to 500kHz. It does not change anything. Slowing down to 250kHz makes watchdog resets, as some parts of the code execute too long (>25ms).

One more thing. I have observed that adding local variables in any source file sometimes makes the problem disappear or reappear. The same concerns initializing uninitialized local variables. Adding a few extra lines of a code in any of the files helps or reveals the problem.

void main(void)
{
    watchdog_init();
    // waiting for power supply to stabilize
    wait; // about 45ms
    hardware_init();
    clear_watchdog();
    application_init();
    clear_watchdog();
    wait; // about 20ms
    test_LED();
    {...}
}

void hardware_init (void)
{
    __asm("NOP"); // how it comes it helps? - it may be in any line of the function
    io_init();      // ports initialization

    clk_init();
    timer_init();
    adc_init();

    spi_init();
    LED_init();
    spi_start();
    key_driver_init();
    can_init();
    irq_init();     // set IRQ priorities and global IRQ enable
}
wp78de
  • 18,207
  • 7
  • 43
  • 71
p.h.
  • 11
  • 1
  • 3
    One of the things with working with hardware is that it is sensitive to timing. Sometimes you just need those NOP instructions to add a few milliseconds of delay to get the timing with the hardware right. – Bart van Ingen Schenau Jan 27 '15 at 22:57
  • 1
    Have you tried hooking up a logic analyser to your SPI bus so you can see what's being sent across it during init? – Jules Jan 28 '15 at 09:02
  • Also, I presume you've eliminated the possibility of a hardware fault (e.g. faulty bit in the MCU's memory causing your program to fail if something critical is there but not if the code ends up aligned so that it doesn't matter)? – Jules Jan 28 '15 at 09:07
  • Yes, I did. As to the SPI - I've used oscilloscope to take a look at the first byte send after power up. It looked just fine. – p.h. Jan 28 '15 at 12:25
  • First thing I thought when starting to read was "power supply problem". Dismissed it when just one nop was enough. And then I saw that comment in your code. Hmm. – Hans Passant Feb 24 '15 at 00:31

2 Answers2

0

Could be one of many things but two spring to mind.

Timing.

Maybe the wait is not long enough for power to stabilize and not everything is synced to the clock. The NOP gets everything back in sync.

Alignment.

Perhaps the NOP gets your instructions aligned on a 32 or 64 bit boundary expected by the hardware. (we used to do this a lot on mainframe assemblers as IO operations often expected things to be on double word boundarys).

James Anderson
  • 27,109
  • 7
  • 50
  • 78
  • As to the timing. Notice there is the long wait loop, then a call to the function, which has NOP as it's first instruction. The time delay it adds to the wait loop is meaningless. Furthermore I can move the NOP before the call, so right after the wait loop. And it still help. Reducing SPI clock does not help too. As to the alignment - it sounds interesting, but why adding/deleting local variables helps? And how I could get the code misaligned? – p.h. Jan 30 '15 at 09:12
0

The problem was solved. It was caused by a trivial bug.

EEPROM's nHOLD and nCS signals were not initialized immediately after MCU's reset, but before the first use of the EEPROM. As a result they were 0's, so active. This means EEPROM was selected, but waiting on hold. Meantime other transfer using SPI started. After 6 out of 8 CLK pulses EEPROM's nHOLD I/O pin was initialized and brought high. EEPROM was no longer on hold so it clocked in last two bits of a data for an other peripheral. Every subsequent operation on the EEPROM found it being having not synchronized CLK and MOSI.

When I have added NOP or anything other the moment of nHOLD 0->1 edge was shifted to happen after the last CLK pulse. Now CLK-MOSI were in sync.

All I have had to do was to initialize all the EEPROM's SPI lines, in particular nHOLD and nCS right after the MCU reset.

p.h.
  • 11
  • 1