-3

currently I am working on setting up benchmark between microcontrollers (based on Powerpc). So I would like to know, if anyone can provide me some documentation showing in detail, what factors are most important to be considered for benchmarking? In other words I am looking for documentation which provides detailed information about factors that should be considered for enhancement in the performance of Core Peripherals, Memory banks
Plus, if someone could provide algorithms that will be lot helpful.

waq
  • 15
  • 6

1 Answers1

1

There is only one useful way and that is to write your application for both and time your application. Benchmarks are for the most part bogus there are too many factors and it is quite trivial to craft a benchmark that takes advantage of the differences, or even takes advantage of the common features in a way to make two things look different.

I perform this stunt on a regular basis, most recently this code

.globl ASMDELAY
ASMDELAY:
    subs r0,r0,#1
    bne ASMDELAY
    bx lr

Run on a raspberry pi (bare metal) the same raspberry pi not comparing two just comparing it to itself, clearly assembly so not even taking into account compiler features/tricks that you can encode in the benchmark intentionally or accidentally. Two of those three instructions matter for benchmarking purposes, have the loop run many tens of thousands of times I think I used 0x100000. The punchline to that performance was those two instructions in a loop ran as fast as 93662 timer ticks and as slow as 4063837 timer ticks for 0x10000 loops. Certainly i cache and branch prediction were turned on and off for various tests. But even with both branch prediction on and the i cache on, these two instructions will vary in speed depending on where they lie within the fetch line and the cache line.

A microcontroller makes this considerably worse depending on what you are comparing, some have flashes that can use the same wait state for a wide range of clock speeds, some are speed limited and for every N Mhz you have to add another wait state, so depending on where you set your clock it affects performance across that range and definitely just below and just above the boundary where you add a wait state (24Mhz minus a smidge and 24Mhz with an extra wait state if it was from 2-3 wait states then fetching just got 50% slower 36Mhz minus a smidge it may still be at the 3 wait states but 3 wait states at 36minus a smidge is faster than 24mhz 3 wait states). if you run the same code in sram vs flash for those platforms there usually isnt a wait state issue the sram can usually match the cpu clock and so that code at any speed may be faster than the same code run from flash.

If you are comparing two microcontrollers from the same vendor and family then it is usually pointless, the internals are the same they usually just vary by how many, how many flash banks how many sram banks how many uarts, how many timers, how many pins, etc.

One of my points being if you dont know the nuances of the overall architecture, you can possibly make the same code you are running now on the same board a few percent to tens of times faster by simply understanding how things work. Enabling features you didnt know where there, proper alignment of the code that is exercised often (simply re-arranging your functions within a C file can/will affect performance) adding one or more nops in the bootstrap to change the alignment of the whole program can and will change performance.

Then you get into compiler differences and compiler options, you can play with those and also get some to several to dozens of times improvement (or loss).

So at the end of the day the only thing that matters is I have an application it is the final binary and how fast does it run on A, then I ported that application and the final binary for B is done and how fast does it run there. Everything else can be manipulated, the results cant be trusted.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • ''if you are comparing two microcontrollers from the same vendor and family then it is usually pointless, the internals are the same they usually just vary by how many, how many flash banks how many sram banks how many uarts, how many timers, how many pins, etc.'' Curretnly I have this case to compare two microcontrollers, there are small differences between the specifications of Cores of these microcontrollers. like one has data chache and local data ram. so how could it be possible to setup benchmark for these kind of differences? – waq Aug 26 '16 at 06:10
  • the one with data cache and local ram is going to have faster memory accesses so if your programs use memory that one will be faster by some percentage. External memory is likely going to be slower. Do you need a benchmark for that? if you need to perform memory tests pound on an address, do random accesses, do linear accesses with and without the cache on the one that has it. I think we already know what the answer is though. – old_timer Aug 26 '16 at 10:11
  • Exactly I need benchmark to prove that which microcontroller has better efficiency. Moreover, not only memory , I have to compare and test each and every aspect of two microcontrollers and based on this knowledge I have to do profiling of application software. Basically I have to do performance enhancement of application software. So based on previous discussion I am looking for Important factors which I need to take for application software performance enhancement. Could you please recommend me some literature references or anyother material? thanks – waq Aug 27 '16 at 07:27
  • The datasheets for each part. – old_timer Aug 27 '16 at 11:36
  • Thanks for time. But I would like to ask in terms of Core performance measurement, could you advise me some benchmarks or algorithm related to Automotive Control? – waq Aug 29 '16 at 11:47