0

have you ever calculated the mips of lpc1788 board? Recently I've calculated a result via following code running in rom:

volatile uint32_t tick;

void SysTick_Handler()
{
    tick++;
}

unsigned long loops_per_ms;

extern void __delay(int n);

int calculate_mips()
{
    int prec = 8;
    unsigned long ji;
    unsigned long loop;

    loops_per_ms = 1 << 12;

    while (loops_per_ms) {
        ji = tick;

        while (ji == tick) ;
        ji = tick;
        __delay(loops_per_ms);

        if (ji != tick)
            break;

        loops_per_ms <<= 1;
    }

    loops_per_ms >>= 1;
    loop = loops_per_ms >> 1;

    while (prec--) {
        loops_per_ms |= loop;

        ji = tick;

        while (ji == tick) ;
        ji = tick;
        __delay(loops_per_ms);

        if (ji != tick)
            loops_per_ms &= ~loop;

        loop >>= 1;
    }

    return loops_per_ms / 500;
}

delay.s:

  PUBLIC __delay
  SECTION .text:CODE:REORDER(2)
  THUMB
__delay
        subs r0, r0, #1
        bhi __delay
        mov pc, lr
  END

With IAR ide, I got loops_per_ms is 39936 and mips will be 79M, whil with Keil, I got a loops_per_ms is 29952 which means the mips is 59M.

The MCU speed is set to 120MHz, by datasheet the MIPS should be 1.25x120=150M, I think code running in ROM slow down the mips.

any body has some comments or other result?

artless noise
  • 21,212
  • 6
  • 68
  • 105
Leslie Li
  • 407
  • 7
  • 14

1 Answers1

0

You cannot measure MIPS in that way. You have no control over how many instructions the compiler will use to implement a particular high-level code source, and it will vary with optimisation level.

The core will achieve 1.25 MIPS per MHz, but that may be reduced depending on a number of factors. For example on Cortex-M on-chip Flash and on-chip RAM use separate buses, so that optimal performance is achieved when data is in RAM and code is in flash. If an instruction in flash needs to fetch data from flash the throughput will be reduced because the instruction fetch and the data fetch must be sequential, whereas a data fetch from RAM can occur in parallel. If you ran the code from RAM you would really notice a slow down since all data and instruction fetches would be sequential. Most Cortex-M parts employ a flash accelerator of some sort to compensate for slower flash memory to achieve zero-wait code execution in most cases, though it is possible to write code perversely to defeat such benefit. Other causes of reduced MIPS is bus latency caused by DMA operations and peripheral wait states.

The simplest and most accurate method of measuring MIPS for your particular application (which for the reasons mentioned above may vary from the optimal) is to use a trace capable debugger, which will capture every instruction executed over a period.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • thanks a lot for your reply, for that __delay routine, it is written in ASM exactly to guarantee there are two instructions for each delay loop. and every instruction will be carried out with one clock. So you can find that the mips I calculated is loops_per_ms / 500, which means, loops_per_ms * 1000 * 2 / 1000000. and these code will run when board just boots up and with only cpu frequency sets to 120MHz and system tick enabled to generate irq every 1ms, do you think by this method, the result is close to the real mips? – Leslie Li Sep 04 '13 at 05:54
  • I see what you mean on closer inspection. I'll need to think about that! First of all I'd at least toggle a GPIO pin between systicks and monitor on a scope, logic analyser or timer/counter instrument to ensure you have the clocking setup correctly. – Clifford Sep 04 '13 at 06:16