How to decrease SPI overhead time for STM32L4 HAL library

Question

I am using a STM32L476RG board and HAL SPI functions:

HAL_SPI_Transmit(&hspi2, &ReadAddr, 1, HAL_MAX_DELAY);
HAL_SPI_Receive(&hspi2, pBuffer, 4, HAL_MAX_DELAY);

I need to receive data from accelerometer's buffer with maximum speed and I have a problem with delay in these functions. As you can see on the oscilloscope screenshots, there are several microseconds during which nothing happens. I have no idea how to minimize the transmission gap.

I tried using HAL_SPI_Receive_DMA function and this delay was even bigger. Do you have any idea how to solve this problem using HAL functions or any pointers on how I could write my SPI function without these delays?

followed Monica to Codidact · Answer 1 · 2018-10-15T11:35:35.950

TL;DR Don't use HAL, write your transfer functions using the Reference Manual.

HAL is hopelessly overcomplicated for time-critical tasks (among others). Just look at the HAL_SPI_Transmit() function, it's over 60 lines of code till it gets to actually touching the Data Register. HAL will first mark the port access structure as busy even when there is no multitasking OS in sight, validates the function parameters, stores them in the hspi structure for no apparent reason, then goes on figuring out what mode SPI is in, etc. It's not necessary to check timeouts in SPI master mode either, because master controls all bus timings, if it can't get out a byte in a finite amount of time, then the port initialization is wrong, period.

Without HAL, it's a lot simpler. First, figure out what should go into the control registers, set CR1 and CR2 accordingly.

void SPIx_Init() {
    /* full duplex master, 8 bit transfer, default phase and polarity */
    SPIx->CR1 = SPI_CR1_MSTR | SPI_CR1_SPE | SPI_CR1_SSM | SPI_CR1_SSI;
    /* Disable receive FIFO, it'd complicate things when there is an odd number of bytes to transfer */
    SPIx->CR2 = SPI_CR2_FRXTH;
}

This initialization assumes that Slave Select (NSS or CS#) is handled by separate GPIO pins. If you want CS# managed by the SPI peripheral, then look up Slave select (NSS) pin management in the Reference Manual.

Note that a full duplex SPI connection can not just transmit or receive, it always does both simultaneously. If the slave expects one command byte, and answers with four bytes of data, that's a 5-byte transfer, the slave will ignore the last 4 bytes, the master should ignore the first one.

A very simple transfer function would be

void SPIx_Transfer(uint8_t *outp, uint8_t *inp, int count) {
    while(count--) {
        while(!(SPIx->SR & SPI_SR_TXE))
            ;
        *(volatile uint8_t *)&SPIx->DR = *outp++;
        while(!(SPIx->SR & SPI_SR_RXNE))
            ;
        *inp++ = *(volatile uint8_t *)&SPIx->DR;
    }
}

It can be further optimized when needed, by making use of the SPI fifo, interleaving writes and reads so that the transmitter is always kept busy.

If speed is critical, don't use generalized functions, or make sure they can be inlined when you do. Use a compiler with link-time optimization enabled, and optimize for speed (quite obviously).

You can use the Low Layers API instead of HAL. There is a lot less of boiler plate code. — phodina, Oct 14 '18 at 20:26
Whose functions are not very suitable for the SPI. SPI at higher speeds need the DMA do be efficient and there is no other way. — 0___________, Oct 14 '18 at 20:49
Hi, many thanks for the tips! I have not much experience in embedded programming. I will try to write this code myself, but it could significantly speed up my work if anyone share a ready to use code example. Could you point me an example with LL API or DMA? — awyzlin, Oct 15 '18 at 06:02
Look at the example folder in the STM Library folder. It provides examples for both HAL and LL implementations. — A.R.C., Oct 15 '18 at 06:15
DMA can speed up the transfer of long SPI transactions. Your transaction is only five bytes. I doubt that you can gain anything with DMA transfers. — Codo, Oct 15 '18 at 08:01
@berendi thank you for your help, `SPIx_Transfer` function works great. Do you have an idea how to implement a similar function using DMA? — awyzlin, Oct 17 '18 at 16:03
@awyzlin Look up the stream and channel numbers (for both transmit and receive) in the DMA request mapping table, set up the DMA channel registers except `CCR`, then follow the description in the *SPI functional description / Data transmission and reception procedure / Communication using DMA* section of the Reference Manual. — followed Monica to Codidact, Oct 17 '18 at 20:50

JMA · Answer 2 · 2018-10-15T12:58:31.073

3

You can use HAL_SPI_TransmitReceive(&hspi2, ReadAddr, pBuffer, 1 + 4, HAL_MAX_DELAY); instead of a HAL_SPI_Transmit and a HAL_SPI_Receive. This will avoid the time between transmit and receive. You can also try changing compilation settings to optimize the speed. You can also check the accelerometer's datasheet, may be you can read all the buffer with a single frame, something lie this: HAL_SPI_TransmitReceive(&hspi2, ReadAddr, pBuffer, 1 + (4 * numOfSamples), HAL_MAX_DELAY);

edited Oct 15 '18 at 12:58

answered Oct 15 '18 at 07:10

JMA

494
2
13

1

Thank you for your advice, I tried using `HAL_SPI_TransmitReceive(&hspi2, ReadAddr, pBuffer, 1 + 4, HAL_MAX_DELAY);` and it works a little faster. Unfortunately, it’s still too slow. – awyzlin Oct 16 '18 at 17:42
@awyzlin You can change the compilation settings to optmize the speed (-03). If the accelerometer has a FIFO you can use and read N samples at the same time, or you can change the HAL code to optmize it. – JMA Oct 17 '18 at 15:49

Johan · Answer 3 · 2020-10-11T11:10:10.327

0

What worked for me:

Read SPI registers directly
Optimize your function for speed

For example function (code); See solution by “JElli.1” in ST- Community >> ST Community answer

edited Oct 11 '20 at 11:10

answered Oct 11 '20 at 00:39

Johan

239
2
8

How to decrease SPI overhead time for STM32L4 HAL library

3 Answers3

Linked