Filling the frame buffer in external ram is very slow my embedded system

Question

I am updating frame buffer in the external ram as and when I get the character codes from the UART by referring a font data base.

The frame buffer size is around 600kb and it takes around 1.5 seconds to fill it completely without using DMA. The external ram size is 8 MB .The frame buffer is in data section and hence the SDRAM controller gives it second priority compared to text section which has highest priority. The SDRAM controller is configured to operate in burst mode.

The processor I am using is OMAP 3515 operating at 200 MHz and external RAM at 133 MHz.

I am trying to find an optimal solution to fill the frame buffer of 600kb in 40 Milli seconds. Kindly assist me.

If you're looking for a code solution then you should post your existing code here. If you're looking for a hardware solution then your question is off-topic for [so] — , Oct 28 '13 at 08:44
I am looking for techniques on filling the frame buffer with setting in SDRAM controller.Not hardware solution. — Akshay, Oct 28 '13 at 09:03
Then you should post your existing code here, or we could spend time suggesting what you have already done. What is the data rate through the UART? — , Oct 28 '13 at 09:05
Are you writing the entire buffer *per character* rather then merely updating the frame buffer block for the character alone? — Clifford, Oct 28 '13 at 14:27
Asking for "techniques" is jumping the gun a little since 600kb in 1.5 seconds sounds remarkably slow for even the most naive of mem copy approaches. It is more likely that you have a poor implementation or hardware configuration. The question is perhaps not so much "how can I make this faster", because it is currently so desperately slow you could only have done something horribly wrong. Have you configured the SDRAM controller to match the memory performance, and what else is running concurrently (interrupts, other threads)? — Clifford, Oct 28 '13 at 14:36

score 1 · Answer 1 · edited May 23 '17 at 12:10

Enable the MMU/MPU and turn on the i-cache and d-cache so that the code is not in competition with the memory movement. Use ldmia and stmia instructions to ensure that you burst lines. Allow the graphics memory to be write-bufferable. This allows the ARM to gang writes together. You may use a HSYNC or VSYNC interrupt to flush the buffers.

As per Clifford, your current algorithm may not be optimal. Ensure that source and destination are aligned to at least 32bits. What is not clear is are you simply copying memory or is the source a different pixel format, stride, etc. If you are doing intense plane calculations, then neon can accelerate some of the pixel unpacking operations. However, ensuring your i-cache is on will make any algorithm many times faster. If you can align your glyphs so that you do not need read-modify-write cycles to the video ram, you can get a speed up. Alternatively, you can use a shadow frame buffer that is copied whole scale to the main video ram on a VSYNC or HSYNC interrupt.

See ARM memtest for some hints on allowing the ARM to saturate the bus.

The main point in the post and major take-away is to minimize the amount of writing and reading. Your algorithm is **memory bound**. There are lots of **ARM** features that can reduce this, but by organizing your [*software blitter*](http://en.wikipedia.org/wiki/Blitter), you can also get a good gain. — artless noise, Oct 28 '13 at 20:24

Filling the frame buffer in external ram is very slow my embedded system

1 Answers1