2

In altera design example, I tried vector add but I can't get the throughput and latency of kernel from the compilation results.

I read the programming guide of Altera. It mentioned to use profile.mon.

Is it possible to use -march=emulator --profile to run aocl report?

Also please let me know if there is any other way I could get the throughput and latency of the kernel.

Simon Woodside
  • 7,175
  • 5
  • 50
  • 66
dev_55
  • 21
  • 3

2 Answers2

2

What information you can get from profiler can be checked in Altera SDK for OpenCL Best Practices Guide. There are example screenshots and detailed info. Here is the link which will put you directly to that section.

I may be wrong but I think it is not possible to get profiling information from emulator. I always build the full kernel to get that.

doqtor
  • 8,414
  • 2
  • 20
  • 36
  • Thanks for the info. when I build full kernel I get a file name kernel_name.attrib in the bin folder with the below information Vectorization: 1 Max_vectorization: 16 Copies: 2 Max_copies: 2 Throughput: 21.75 Copyfactor: 1 Sharing: 1 Max_sharing: 1 Unroll: 1 Max_unroll: 1 Throughput_unroll: 1 Aggressive_unroll: 1 here there is a throughput value but I dont know whether its referring to kernel. Is there any help document get to know about the details in .attrib file ? – dev_55 Jun 01 '16 at 09:55
  • See [this](http://www.alteraforum.com/forum/showthread.php?t=50032&highlight=Copyfactor). Seems that is not something you can rely on as that is probably just an estimation. You can try to find something more on Altera OpenCL forum. – doqtor Jun 01 '16 at 11:07
1

Simon, if you do not mind a small historic "adventure" you could try to download and install 13.1 version of the Altera (now Intel) OpenCL SDK. These older tools had an option to print out throughput by default or via --estimate-throughput switch. These estimates would work only for Stratix V cards (e.g. PCIe385n_d5). Arria 10 did not exist back then. But knowing that architecturally devices are somewhat similar this should give you some guideline. Afterwards do not forget to submit Service Request to Intel to put these estimates back into the OpenCL SDK compiler.

If you even more adventurous type, you could simulate your kernel in Modelsim Intel Starter Edition (free ) even without DDRx and PCIe models and this would give you cycle accurate answer to throughput and latency questions. You can generate the entire test bench automatically using QSYS.

My Name
  • 151
  • 1
  • 7