I'm currently playing around with the STM32F303xx family of chips. They feature the core coupled memory (CCMRAM) which allows execution of code unlike the CCM found on the F4 series. I have put the critical routines (e.g ISR's) into the CCM and was wondering what would be the most efficient setup, putting the Interrupt Vector Table also into CCM or into the normal SRAM and am kind of stuck on that one. Can anybody hint me in the right direction?
1 Answers
I am not sure that it makes any difference to code execution performance directly, but the critical thing is the bus architecture, and where you place data and code, and whether you are executing DMA operations or will be writing to the flash memory.
The Flash memory, the SRAM and the CCM are each on a separate bus, on many STM32 parts the SRAM, and for larger parts the flash are further divided into more than one bus. So when code is executed from one, data can be fetched concurrently from another. If however you place your data and instructions in the same memory, instruction and data access must be serialised. Equally if you have DMA operations to/from memory that can also impact both data access and instruction fetch from the same memory.
For the most part, there is little or no latency for code execution from on-chip flash on an STM32 due to the flash accelerator, so there may be little to gain from placing code in the CCM at all. Code that needs to execute while programming the flash memory is an exception, since flash write/erase operations stall the bus for a significant length of time on STM32.
For performance it is best to arrange it such that DMA, instruction fetch and data access all occur on separate busses for the most part. Bearing in mind also that you cannot DMA or bit-band access the CCM. So CCM is good for instruction or data (where DMA or bitband access is not required), but ideally not both at the same time.
When either CCM or SRAM is used for code you have the added linker/start-up complexity of placing code in RAM, and the possibility of code corruption from errant code or security flaws with little or no significant performance benefit compared to on-chip flash. External memory, of any kind will be significantly slower - partly because of the clock rate of the EMIF, and also because it is a single bus for both data and instruction for all external memories.

- 88,407
- 13
- 85
- 165
-
First of: Thanks for the lengthy answer! When you say flash accelerator, I assume you mean the ART accelerator, right? While this is nice, from my understanding that only helps me for normal program flow and not on Interrupts. So ISR's exectuting from flash should have longer latency. Apart from this the F3 doesn't even feature the ART accelerator. So putting the ISR's into CCM allows for shorter latency without interfering with the DMA, but when I also put the Vector Table in there, I do have the situation you mentioned with code and data, when I put it into SRAM, DMA can be affected. – L.K. Oct 04 '18 at 13:21
-
1Re- interrupt latency; possibly, but meeting a dead-line is meeting a deadline - I would only add complexity when you are failing to meet deadlines. That said if interrupt latency and processing is critical then CCM is a good option since it cannot be extended by DMA bus contention or delayed by flash writes. – Clifford Oct 04 '18 at 13:39
-
I don't know this as fact, which is why it's a comment, but there's 12 cycles of latency from asserting the interrupt to executing the first instruction, the art accelerator may be getting to work in that time fetching the isr code. – Colin Oct 04 '18 at 13:42
-
I am not referring to the ART, rather the _prefetch buffer_. Running at 72MHz the flash needs 2 wait-states, but the prefetch mitigates that significantly. Have you enabled the prefetch buffer? It is not enabled on reset - your start-up code probably sets it, but worth checking. – Clifford Oct 04 '18 at 13:55
-
1Regarding @colin's comment see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16366.html. The 12 cycles is intrinsic of course, the CCM will not improve that. Wait-states are added to this, but if 2 wait-states mean the difference between hitting a deadline and not or meeting determinism requirements, you may have more to worry about! My point really is that there is no simple answer. Build it, test it - the result will depend on many factors. – Clifford Oct 04 '18 at 14:01
-
@Clifford thanks again for you answering. It's not so much that I need it now, I'm more interested in understanding the hardware correctly. You are ofc right that it adds complexity. It's 4 wait states with exiting tho, plus whatever might happens inside the ISR, but yes it depends. Prefetch buffer is enabled on reset on the F3 according to the reference manual. What I did find tho is that running the ISR from CCM seems to reduce jitter of the ISR execution. But again you are right that in most use cases it is not needed. So thanks again! – L.K. Oct 04 '18 at 14:38