-2

What i mean is that given a source code file is it possible to extract energy consumption levels for a particular code block or 1 single instruction, using a tool like perf?

maaz
  • 17
  • 2
  • 1
    Possible duplicate of [Estimate Power Consumption Based on Running Time Analysis / Code Size](https://stackoverflow.com/q/1596252/608639), [Is it possible/easy to determine how much power a program is using?](https://stackoverflow.com/q/20907535/608639), [How to give an estimation of the energy consumed by a program on an ARM platform?](https://stackoverflow.com/q/35391643/608639), [How can I measure the energy consumption of my application on Windows Mobile](https://stackoverflow.com/q/724349/608639), etc. – jww Jul 17 '18 at 22:04
  • 1
    Please be more specific in your question. How long is does the *"particular code block"* execute. What system are you running on? (Some processors have built-in power measurements. – Zulan Jul 18 '18 at 07:54
  • @jww None of these are duplicate or even helpful. The answer to the first one cites a paper from 2006, which is pretty much useless now. The answers to the second and third ones are not helpful at all. The fourth one is specific to Windows Mobile and Microsoft stopped its production since 2010, so I highly doubt this question is about Windows Mobile. – Hadi Brais Jul 18 '18 at 09:47
  • 1
    The question is too broad. What is the architecture of interest? Do you want to measure energy per instruction or energy for a program or a relatively large block of instructions? These are different questions and have different answers. – Hadi Brais Jul 18 '18 at 09:50
  • @HadiBrais target architecture is Intel 8th gen mobile processor. I want to measure the energy per instruction (preferably energy per instruction is not average of the entire program's energy) or energy of blocks of instruction which ever is possible. A sample program [link](https://paste.ee/p/N1GNG) – maaz Jul 18 '18 at 12:56

2 Answers2

2

Use jRAPL which is a framework for profiling Java programs running on CPUs.

For example, the following code snippet attempts to measure the energy consumption of any code block, whose value is the difference between beginning and end:

double beginning = EnergyCheck.statCheck();
doWork();
double end = EnergyCheck.statCheck();
System.out.println(end - beginning);

And the detailed paper of this framework titled "Data-Oriented Characterization of Application-Level Energy Optimization" is in http://gustavopinto.org/lost+found/fase2015.pdf

Hasanen
  • 45
  • 7
  • How are you going to construct microbenchmarks that run large blocks of the same asm instruction? Java has to go through a JIT compiler before it becomes machine code, so it's the opposite of what you want. You might use it for RAPL and have your program spend all its time in a loop inside a native function maybe. But jRAPL alone is nowhere near sufficient to answer the question. – Peter Cordes Nov 26 '18 at 11:23
  • @PeterCordes Thanks for the clarification, in that case, he could use Sim-Wattch [link](http://www.ecs.umass.edu/ece/koren/architecture/ETCache/tools_used.html) to simulate the execution of the program and estimate its energy consumption. – Hasanen Nov 26 '18 at 12:02
  • That link doesn't say anything about having an accurate model of Intel Sandybridge, for example. It talks about simulating a hypothetical CPU design with a 5 stage out-of-order pipeline. No mention of any of the complexity that modern x86 adds on top of that, or the much longer pipeline, and they'd need an accurate model of various x86 CPUs, in which case the answer to this question would be to just look up the table in the source code of the incremental cost of the execution units for each instruction, because it would mean someone had already measured it... – Peter Cordes Nov 26 '18 at 12:08
  • *or you could apply it for a single instruction, whose value is the difference between beginning and end* Don't be ridiculous, Java doesn't even have inline asm so you couldn't make that happen even if it was useful. Plus, calling the relevant APIs will involve thousands of instructions at least, completely dominating the average-power signal from *one* extra instruction inside the timing window unless it's an extremely slow privileged instruction. [`wbinvd`](http://felixcloutier.com/x86/WBINVD.html) is probably the only one you might actually pick out from the noise on x86 without a loop. – Peter Cordes Nov 26 '18 at 13:06
1

There are tools for measuring power consumption (see @jww's comment for links), but they don't even try to attribute consumption to specific instructions the way perf record can statistically sample event -> instruction correlations.

You can get an idea by running a whole block of the same instruction, like you'd do when trying to microbenchmark the throughput or latency of an instruction. Divide energy consumed by number of instructions executed.

But a significant fraction of CPU power consumption is outside of the execution units, especially for out-of-order CPUs running relatively cheap instructions (like scalar ADD / AND, or different memory subsystem behaviour triggered by different, like hardware prefetching).

Different patterns of data dependencies and latencies might matter. (Or maybe not, maybe out-of-order schedulers tend to be constant power regardless of how many instructions are waiting for their inputs to be ready, and setting up bypass forwarding vs. reading from the register file might not be significant.)

So a power or energy-per-instruction number is not directly meaningful, mostly only relative to a long block of dependent AND instructions or something. (Should be one of the lowest-power instructions, probably fewer transistors flipping inside the ALU than with ADD.) That's a good baseline for power microbenchmarks that run 1 instruction or uop per clock, but maybe not a good baseline for power microbenches where the front-end is doing more or less work.

You might want to investigate how dependent AND vs. independent NOP or AND instructions affect energy per time or energy per instruction. (i.e. how does power outside the execution units scale with instructions-per-clock and/or register read / write-back.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847