1

For my current project i have to investigate the runtime behavior (used cycles) of different algorithms on a Cortex-M4. The algorithms are pure computation in C, no IO and interrupts. Any hints and ideas how to do it?

My current idea is to create a minimal application and use renode (https://renode.io/) for cycle counting:

  • Create a file test.c with one function with fixed signature that runs my algorithm
  • Compile and link it to perform a minimal application
  • Load the application and the needed input data into renode
  • Run the application
  • Extract the output data from renode
  • Use the profiling data from renode to rate the algorithms

And now the questions:

  • Has anyone used renode or QEMU for similar purposes?
  • How to create a true minimal application? (crt0,ld flags)
  • Any other ideas for my problem?
  • How to configure a minimal system in renode? Which components are a minimal subset to successful run a C program?

Regards Jan

Piotr Zierhoffer
  • 5,005
  • 1
  • 38
  • 59
Jan Baer
  • 11
  • 1
  • 1
    What have you tried so far? – KaiserKatze Dec 26 '20 at 15:32
  • 5
    Any reason you don't run the tests on a real Cortex-M4 based machine and use the internal cycle counter? That seems simpler to me. – andy mango Dec 26 '20 at 16:37
  • You benchmark programs, not algorithms – Basile Starynkevitch Feb 09 '21 at 15:59
  • 1
    Run the test code on an M4 with interrupts disabled, then you're can measure only the time taken for the algorithm to run. If you use a lot of tooling, you may end up measuring the time of the algorithm + time needed to service the scaffolding. I don't understand what need renode.io fills here ‍♂️ – Morten Jensen Feb 09 '21 at 16:00
  • @JanBaer if the given answer provides a solution to your question, please see [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) (though given the age of the question, it is unclear whether the OP's account is active) – David C. Rankin Oct 29 '22 at 02:53

1 Answers1

2

FYI: I work at Antmicro and am one of the authors of Renode

There are really many ways to perform such profiling. Note that Renode is not cycle-accurate, but you can track virtual time progression.

One of the possible approaches would be to use Renode's metrics analyzer. You can read the docs here: https://renode.readthedocs.io/en/latest/basic/metrics.html

It allows you to capture data and analyze it in Python or generate some graphs straight away:

# in Renode
(monitor) machine EnableProfiler "path_to_dump_file"

# in Bash
python3 tools/metrics_analyzer/metrics_visualizer/metrics-visualizer.py path_to_dump_file

You can also analyze the virtual time passed until a specific string appears on UART. This can be done with a Robot test. An example of timestamp extraction can be found here: https://github.com/renode/renode/blob/master/tests/platforms/QuarkC1000/QuarkC1000.robot#L44

${r}        Wait For Line On Uart     My String
            Do Something With Time    ${r.timestamp}

Another option would be to instrument your code and dump binary data from memory, if needed.

You can also add hooks to be called on specific Program Counter value - then you can dump such a timestamp to log.

There are possibly many other options to move forward, but it would depend on your specific use case.

Minimal system in Renode: depending on your software, it would require

  • a core
  • nvic controller, if it's Cortex-M
  • memory
  • uart if you want output.

UPDATE:

We have added some tracing features that allow you to use https://www.speedscope.app/ or https://ui.perfetto.dev/ to display traces of execution, very useful in profiling.

The quick way to enable it for speedscope is:

cpu EnableProfilerCollapsedStack @path/to/trace true

For more details please see this chapter in the docs: https://renode.readthedocs.io/en/latest/advanced/execution-tracing.html

Piotr Zierhoffer
  • 5,005
  • 1
  • 38
  • 59