2

We are currently developing an application for a msp430 MCU, and are running into some weird problems. We discovered that declaring arrays withing a scope after declaration of "normal" variables, sometimes causes what seems to be undefined behavior. Like this:

foo(int a, int *b);

int main(void)
{
    int x = 2;
    int arr[5];

    foo(x, arr);

    return 0;
}

foo is passed a pointer as the second variable, that sometimes does not point to the arr array. We verify this by single stepping through the program, and see that the value of the arr array-as-a-pointer variable in the main scope is not the same as the value of the b pointer variable in the foo scope. And no, this is not really reproduceable, we have just observed this behavior once in a while.

This is observable even before a single line of the foo function is executed, the passed pointer parameter (b) is simply not pointing to the address that arr is.

Changing the example seems to solve the problem, like this:

foo(int a, int *b);

int main(void)
{
    int arr[5];
    int x = 2;

    foo(x, arr);

    return 0;
}

Does anybody have any input or hints as to why we experience this behavior? Or similar experiences? The MSP430 programming guide specifies that code should conform to the ANSI C89 spec. and so I was wondering if it says that arrays has to be declared before non-array variables?

Any input on this would be appreciated.


Update

@Adam Shiemke and tomlogic:

I'm wondering what C89 specifies about different ways of initializing values within declarations. Are you allowed to write something like:

int bar(void)
{
    int x = 2;
    int y;

    foo(x);
}

And if so, what about:

int bar(int z)
{
    int x = z;
    int y;

    foo(x);
}

Is that allowed? I assume the following must be illegal C89:

int bar(void)
{
    int x = baz();
    int y;

    foo(x);
}

Thanks in advance.


Update 2 Problem solved. Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.

If anybody is interested I can post the complete example reproducing the problem, and the fix?

Thanks for all the input on this.

Bjarke Freund-Hansen
  • 28,728
  • 25
  • 92
  • 135
  • The value of the pointer in the function should be the address of the array variable in main. Is this what you mean? – CB Bailey Mar 25 '10 at 11:37
  • There may be a bug with the compiler, but it would probably be reproducible then. What compiler are you using? – mjh2007 Mar 25 '10 at 13:55
  • IT would help if you also specified what tool and versions you where using. I use the msp430 dayly but never observed this. Are you sure that when accessing the pointer inside foo() it actually changes the wrong place in memory or it migth be just a debugging value. Also hva you checked the errata for the chip ( you did'nt specify ) – eaanon01 Mar 25 '10 at 14:22
  • What you're describing sounds to be a bug in the compiler, but this is such basic stuff that I'd be reluctant to believe that's really the case. Maybe you can post some code and the disassembled compiler output for when you see the problem? – Michael Burr Mar 25 '10 at 15:49
  • 1
    Another thing I wonder: does the problem also get 'fixed' if you pass `foo(x, &arr[0]);`? – Michael Burr Mar 25 '10 at 17:36
  • All the examples are valid including the last one. Who's compiler are you using? Are you attempting to debug code with optimisation settings applied? – Clifford Mar 26 '10 at 20:09

7 Answers7

3

That looks like a compiler bug.

If you use your first example (the problematic one) and write your function call as foo(x, &arr[0]);, do you see the same results? What about if you initialize the array like int arr[5] = {0};? Neither of these should change anything, but if they do it would hint at a compiler bug.

bta
  • 43,959
  • 6
  • 69
  • 99
3

In your updated question:

Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.

It sounds as if the interrupt disabling intrinsic/function/macro (or however interrupts are disabled) might be causing an instruction to be 'skipped' or something. I'd investigate whether it is coded/working correctly.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 3
    Try to look at the errata sheet of the MCU (http://focus.ti.com/docs/prod/folders/print/msp430f5438.html). It is filled with conditions that might corrupt the PC, and the workaround is to insert NOP instructions after the affected conditions. I'm considering just always inserting NOP instructions after I do anything involving interrupts or low-power mode. – Bjarke Freund-Hansen Mar 30 '10 at 08:07
  • @bjarkef: after a quick glance at the errata, it sure looks like your workaround might well be the necessary fix. I guess that I've been fortunate in dealing with CPUs that seem to have fewer uncertainties in how the program counter is handled in branches and interrupt handling. Yikes! – Michael Burr Mar 30 '10 at 18:50
2

Both examples look to be conforming C89 to me. There should be no observable difference in behaviour assuming that foo isn't accessing beyond the bounds of the array.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • It is not accessing beyond the array. In the first example the pointer variable b in the foo function scope is literally not pointing to the same address as the array-as-a-pointer variable arr in the main function. This is clearly not correct behavior, so I am curious if anybody else have seen this before, and how it might be related to the order of variable declarations. – Bjarke Freund-Hansen Mar 25 '10 at 11:42
  • @bjarkef: It shouldn't. It *might* affect the order that the variables are allocated in on the stack but that shouldn't make any difference. If `foo` isn't causing any undefined behaviour in any other way then you must have an compiler, implementation or hardware issue. – CB Bailey Mar 25 '10 at 11:47
  • It should be possible to create a reproducible test case for the problem, too - even if buggy, the compiler should at least be deterministic. – caf Mar 25 '10 at 12:26
  • I might try to create a simple reproducible test case at some point, but for now I'm mostly interested in hearing if anybody else have been dealing with a problem with the same symptoms, and what the cause was then? – Bjarke Freund-Hansen Mar 25 '10 at 13:33
  • I don't have experience with the MSP430 compiler, but on the off chance that it does happen again, take a look at the generated list file (assuming you have one) and/or memory map to see if the generated code is passing the correct address or not. – tomlogic Mar 25 '10 at 19:05
2

For C89, the variables need to be declared in a list at the start of the scope prior to any assignment. C99 allows you to mix assignment an declaration. So:

{ 
    int x; 
    int arr[5];

    x=5;
...

is legal c89 style. I'm surprised your compiler didn't throw some sort of error on that if it doesn't support c99.

  • C89/C90 allows for variable initializers with the declarations. You can't mix declarations and code though. It would be interesting to see if the problem went away without the initializer -- could be a compiler error related to using that feature. – tomlogic Mar 25 '10 at 19:03
  • Hi. Thanks for the input on the C89 spec., please see updated question. I'm a bit unsure about what exactly is allowed under C89 regarding variable initialization. – Bjarke Freund-Hansen Mar 26 '10 at 11:54
2

You should be able to determine if it is a compiler bug based on the assembly code that is produced. Is the assembly different when you change the order of the variable declarations? If your debugger allows you, try single stepping through the assembly.

If you do find a compiler bug, also, check your optimization. I have seen bugs like this introduced by the optimizer.

semaj
  • 1,555
  • 1
  • 12
  • 25
2

Assuming the real code is much more complex, heres some things i would check, keep in mind they are guesses:

Could you be overflowing the stack on occasion? If so could this be some artifact of "stack defense" by the compiler/uC? Does the incorrect value of &foo fall inside a predictable memory range? if so does that range have any significance (inside the stack, etc)?

Does the mcu430 have different ranges for ram and rom addressing? That is, is the address space for ram 16bit while the program address space 24bit? PIC's have such an architecture for example. If so it would be feasible that arr is getting allocated as rom (24bit) and the function expects a pointer to ram (16bit) the code would work when the arr was allocated in the first 16bit's of address space but brick if its above that range.

Mark
  • 2,932
  • 18
  • 15
  • Definitely, the first thing I would check is stack corruption. This could be the classic stack overflow, but also a runaway pointer corrupting the stack. – Miro Samek Mar 27 '10 at 14:08
1

Maybe you have at some place in your program in illegal memory write which corrupts your stack.

Did you have a look at the disassembly?

codymanix
  • 28,510
  • 21
  • 92
  • 151