Transfer data from ISR/callback to a thread with RTOS

Question

I am using ADC with DMA that captures analog vlaues and generates a callback when transfer is complete. I then, will be transfering the data to a thread to process the data as processing takes some time and I don't want to block callback function.

Buffer is of length 200, I am using the buffer as ping pong buffer and am generating a callback on ADC half complete and full complete events so that thee should be no overlap of data in same buffer.

Below is my current implementation on STM32 with CMSIS RTOS 2

#define BUFFER_SIZE 100
static int16_t buffer[BUFFER_SIZE*2] = {0};

static volatile int16_t *p_buf[2] = {&(buffer[0]), &(buffer[BUFFER_SIZE])};

typedef struct
{
    void *addr;
    uint32_t len;
} msg_t;

void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef *hadc)
{
    msg_t msg;
    msg.addr = (void *)p_buf[0];
    msg.len = BUFFER_SIZE;

    osMessageQueuePut(queue, &msg, 0, 0);
}

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc)
{
    msg_t msg;
    msg.addr = (void *)p_buf[1];
    msg.len = BUFFER_SIZE;

    osMessageQueuePut(queue, &msg, 0, 0);
}

static void process_thread(void *argument)
{
    msg_t msg;

    queue = osMessageQueueNew(4, sizeof(msg_t), NULL);

    while (1)
    {
        osMessageQueueGet(queue, &msg, 0, osWaitForever);
        // process data
    }
}

What is recommended way to transfer to data from half buffer to a thread from callback/ISR using CMSIS RTOS 2?
Currently queue size is set to 4, for some reason if processing thread takes too much of time; the queue becomes useless as buffer pointer will point to stale oe ongoing data. How to overcome this issue?

Is osMessageQueuePut() safe to use from an interrupt-handler, (ie. does not make any OS calls that can block)? — Martin James, Dec 21 '22 at 23:04
@MartinJames as long as the timeout is 0, it can call in an ISR (per the documentation) — Russ Schultz, Dec 21 '22 at 23:16
Why are you using `int16_t`? ADC values are not typically negative. What's the ADC resolution? — Lundin, Dec 22 '22 at 08:32
What specific STM32 are you using? For a STM32F7xx (Cortex-M7) you have to consider cache coherency when using DMA. Simpler to use the MMU to mark the DMA buffer non-cachable. For simpler STM32 parts it is not an issue, the point is that simply specifying "STM32" may be insufficient. — Clifford, Dec 22 '22 at 11:04
... Moreover some STM32 have a DMA controller that supports "double-buffering" - that differs from the HT/FC circular mode in that the two buffers need not be contiguous, so you can switch the buffer to a "fresh" buffer on every interrupt. — Clifford, Dec 22 '22 at 11:18
You have somewhat over-complicated your buffering with `p_buf`. You could simply have: `static volatile int16_t buffer[2][BUFFER_SIZE] = { {0}, {0} } ;` and use `buffer[0]` and `buffer[1]` in place of `p_buf[0]` and `p_buf[1]`. Though as @Lundin points out `uint16_t` is probably more appropriate - especially if you are using the left-aligned mode. Also you do not need to (and should not) cast to `void*` when assigning a pointer of any type to a void pointer. — Clifford, Dec 22 '22 at 11:50
I am using STM324L96 series controller. `int16_t` can cause issue which I didn't think of. Thank you for your inputs. Will go through the solutions and get back. — Android_dev, Dec 23 '22 at 03:57

Clifford · Answer 1 · 2022-12-22T16:13:54.343

If you are using double buffering but only passing a pointer directly to the DMA buffer, there is no purpose in having a queue length of more than 1 (one buffer being processed, one buffer being written) and if your thread cannot process that in time, it is a flawed software design or unsuitable hardware (too slow). With a queue length of 1, if the receiving task has not completed processing in time osMessageQueuePut in the ISR will return osErrorResource - better to detect the overrun that to let it happen with undefined consequences.

Generally you need to pass the data to a thread that is sufficiently deterministic such that it is guaranteed to meet deadlines. If you have some occasional non-deterministic or slow processing, then that should be deferred to yet another lower priority task rather then disturbing the primary signal processing.

A simple solution is to copy the data to the message queue rather than passing a pointer. i.e. a queue of buffers rather then a queue of pointers to buffers. That will increase your interrupt processing time, for the memcpy but will still be deterministic (i.e. a constant processing time), and the aim is to meet deadlines rather than necessarily be "as fast as possible". It is robust and if you fail to meet deadlines you will get no data (a gap in the signal) rather than inconsistent data. That condition is then detectable by virtue of the queue being full and osMessageQueuePut returning osErrorResource.

A more complex but more efficient solution is to use the DMA controller's support for double-buffering (not available on all STM32 parts, but you have not specified). That differs from the circular half/full transfer mode in that the two buffers are independent (need not be contiguous memory) and can be changed dynamically. In that case you would have a memory block pool with as many blocks as your queue length. Then you assign two blocks as the DMA buffers and when each block becomes filled, in the ISR, you switch to the next block in the pool and pass the pointer to the just filled block on to the queue. The receiving task must return the received block back to the pool when it has completed processing it.

In CMSIS RTOS2 you can use the memory pool API to achieve that, but it is simple enough to do in any RTOS using a message queue pre-filled with pointers to memory blocks. You simply allocate by taking a pointer from the queue, and de-allocate by putting the pointer back on the queue. Equally however in this case you could simply have an array of memory blocks and maintain a circular index since the blocks will be used and returned sequentially and used exclusively by this driver. In that case overrun detection is when the queue is full rather than when block allocation fails. But if that is happening regardless of queue length, you have a scheduling/processing time problem.

To summarise, possible solutions are:

Ensure the receiving thread will process one buffer before the next is available (fast, deterministic and appropriate priority with respect to other threads), and use an appropriate queue length such that any overrun is detectable.
Use a queue of buffers rather then a queue of pointers and memcpy the data to the message and enqueue it.
Use true double-buffering (if supported) and switch DMA buffers dynamically from a memory pool.

One final point. If you are using an STM32 part with a data-cache (e.g. STM32F7xx Cortex-M7 parts), your DMA buffers must be located in a non-cachable region (by MPU configuration) - you will otherwise slow down your processor considerably if you are constantly invalidating the cache to read coherent DMA data, and unlikely to get correct data if you don't. If you use a CMSIS RTOS memory pool in that case, you will need to use the osMemoryPoolNew attr structure parameter to provide a suitable memory-block rather then using the kernel memory allocator.

Nice detailed options:) – Martin James Dec 22 '22 at 13:21 — Martin James, Dec 22 '22 at 13:21

score 1 · Answer 2 · answered Dec 21 '22 at 23:34

If you're stuck with half buffer notifications because of hardware limitation, one possibility is to copy from the half buffer to another buffer from a larger pool.

You'll (probably) eventually need to do this anyways so you don't lose data (as you're experiencing) and so you can bridge the non-cached/cache gap. Your hardware pingpong DMA buffer is going to be necessarily be non-cached and you'll want whatever buffer you do work with, particularly if you're doing filtering or other postprocessing on it, to be cached.

You can wait on a queue in the ISR (same stipulation...timeout must be 0), so have the ISR get a buffer from the "empty" queue, fills it, then puts it in the "filled" queue. Application takes from the "filled" queue, processes, returns to "empty" queue.

If the ISR ever encounters a situation where it can't get an "empty" buffer, you need to decide how to handle that. (skip? halt?) It basically means the application repeatedly ran over its deadlines until the queue emptied. If its a transient load, then you can increase the queue depth and use more buffers to cover the transient load. If it just slowly gets there and can't recover, you need to optimize your application or decide how to gracefully drop data because you can't process data fast enough in general.

You can get away with using ring buffers where only one side modifies the write pointer and the other the read pointer, but if you've got os queues that work across ISRs, it makes the code cleaner and more obvious what's going on.

It is unfortunate if the hardware dors no allow pool buffers to be directly loaded by the DMA and so force bulk copies in the completion handler:(( — Martin James, Dec 22 '22 at 08:45
@MartinJames : that is not a given - not all STM32 DMA controllers are the same and some support true double-buffering using independent non-contiguous and re-assignable buffers. Unfortunately the OP has not specified the precise part. — Clifford, Dec 22 '22 at 11:40
While it is a consideration, it is not specified which STM32 so we do not know whether it has data memory caching or an MPU. Most don't, Cortex-M7 variants do. CMSIS RTOS2 has a memory pool API that can be used rather than use an "empty" queue (though no doubt under the hood it is the same thing and just a "convenience" interface). — Clifford, Dec 22 '22 at 12:20

Transfer data from ISR/callback to a thread with RTOS

2 Answers2