4

I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.

What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.

To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:

#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_

#define TWI_INPUT_QUEUE_SIZE 256

// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);

typedef struct {
    uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
    volatile uint16_t length; // currently used bytes in the buffer
    twi_slave_callback_t slave_callback;
} twi_global_slave_t;

typedef struct {
    uint8_t slave_address;
    volatile twi_global_slave_t slave;
} twi_global_t;

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);

#endif

Now the C file (twi_driver.c):

#include <stdint.h>
#include "twi_master_driver.h"

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
    twi->slave.length = 0;
    twi->slave.slave_callback = slave_callback;

    twi->slave_address = slave_address;

    // temporary workaround <- why does this work??
    twi->slave.slave_callback = twi->slave.slave_callback;
}

void twi_slave_interrupt_handler(twi_global_t *twi)
{
    (twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);

    // some other stuff (nothing touches twi->slave.slave_callback)
}

Then I build those two files into a static library (.a) and construct my main program (main.c) #include #include #include #include #include "twi_master_driver.h"

//  ...define microcontroller safe way for mystdout ...

twi_global_t bus_a;

ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
    twi_slave_interrupt_handler(&bus_a);
}

void my_callback(uint8_t *buf, uint16_t len)
{
    uint8_t i;

    fprintf(&mystdout, "C: ");
    for(i = 0; i < length; i++)
    {
        fprintf(&mystdout, "%d,", buf[i]);
    }
    fprintf(&mystdout, "\n"); 
}

int main(int argc, char **argv)
{
    twi_init(2, &bus_a, &my_callback);

    // ...PMIC setup...

    // enable interrupts.
    sei();

    // (code that causes interrupt to fire)

    // spin while the rest of the application runs...
    while(1){
        _delay_ms(1000);
    }
    return 0;
}

I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.

Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.

By adding the commented workaround line in twi_driver.c:

twi->slave.slave_callback = twi->slave.slave_callback;

The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?

As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.

The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.

Any ideas?

Edits:

  • resolved typos in code.

  • links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]

  • fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL

  • I've tried the same code included directly (rather than via a library) and I've got the same issue.

Edits (round 2):

  • I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
  • Probably doesn't change anything, but try removing the `&` from the first line in main: `twi_init(2, &bus_a, my_callback);` – pmg Dec 23 '09 at 01:50
  • Going out for several hours but I'll check later and try again if not solved: Brief ideas; Does it work if you don't use library? (i.e. put everything shown in one file). Can you really fprintf() from an interrupt (sounds dangerous)?. Repost showing everything; What you have isn't quite perfect in the details, eg length vs len in my_callback(), where are you enabling interrupts ? is it twi_driver.h or twi_master_driver.h ? Sorry to be not much help yet, as I say I'll try again later. – Bill Forster Dec 23 '09 at 02:34
  • How does the assembly generated by avr-gcc differ with and without the workaround line? i.e. could you post the results of `avr-gcc -S twi_driver.c` for both versions? Which AVR are you targeting? – mrkj Dec 23 '09 at 02:46
  • Is this the AVR32 or 8-bit AVR architecture that you're talking about? – caf Dec 23 '09 at 03:41
  • Some clarifications: 8-bit AVR, target = ATxmega128A1, fprintf works from the interrupt. Tried removing & from the function name without any issue. I'll post the full files on the web in a few minutes. – Mark Elliot Dec 23 '09 at 03:46
  • is this with or without compiler optimizations? – Eli Bendersky Dec 23 '09 at 03:49
  • I should add, getting this code to run on your own stk600/atxmega128a1 is going to take some wiring, for those of you who actually want to run it I can provide a wiring diagram. – Mark Elliot Dec 23 '09 at 04:02
  • Sorry I had another good look and I can't see the problem. All I can offer is some general advice. Try reducing the problem to the minimum possible, all in one file, with the fewest possible dependencies. There is a lot of visible noise at the moment, and there's even some I still can't see, like the definition of TWI_t. At the moment you're printing out the callback address in only one place - is it correct there or not? Good luck, wish I could be of more help. – Bill Forster Dec 23 '09 at 08:25

3 Answers3

2

Just a hunch, but what happens if you switch these two lines around:

twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;

Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.


It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.

If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.

caf
  • 233,326
  • 40
  • 323
  • 462
  • seems like this is on the right track, I swapped the order and that fixes the problem. Why would the length field be overwriting the callback field? – Mark Elliot Dec 26 '09 at 19:42
  • further complicating things, if I reorder the struct elements the problem also seems to go away. – Mark Elliot Dec 26 '09 at 19:49
1

This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.

Rob Curtis
  • 405
  • 2
  • 6
0

Replace "&my_callback" with "my_callback" in function main().

Because different threads access the callback address, try protecting it with a mutex or read-write lock.

If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.

Steve Emmerson
  • 7,702
  • 5
  • 33
  • 59
  • I've tried that, it makes no difference. The value is set (and unchanged) before the interrupts are enabled, so no access mutex read-write locks are required (access controls are by convention). – Mark Elliot Dec 23 '09 at 03:43
  • 1
    Since `my_callback` is a function, there's no difference between `&my_callback` and `my_callback`. And since the "different thread" is actually just code executing in interrupt context, locking will simply deadlock. Having interrupts disabled while the `twi_init` function runs is what's needed, which is ensured if interrupts aren't enabled until after that point. – caf Dec 23 '09 at 03:56
  • @caf, that's what I have. @Steve Emmerson: the callback function pointer is accessed by a signal handler, that's the point, it's not *modified* by that handler or any other after the intialization, though. – Mark Elliot Dec 23 '09 at 04:01
  • Keep the "volatile", then. I still suspect misbehavior regarding sequence points. Try decreasing the level of optimization. – Steve Emmerson Dec 23 '09 at 04:54
  • I would also try changing the "-Os" optimization option to "-O0" to see if the problem is caused by optimization. – Steve Emmerson Dec 23 '09 at 16:10