Finding the compute unit id when launching a kernel on an AMD GPU

Question

I am using ROCm software stack to compile and run OpenCL programs on a Polaris20 GCN4th AMD GPU and wondering if there is a way to find out which compute unit (id) on GPU is in use now by the current work-item or wavefront?

In other words, can I associate a computation in a kernel to a specific compute unit or specific hardware on GPU, so I can keep track of which part of the hardware is getting utilized while a kernel runs.

Thank you!

May I ask what you need that for? Is there no way to bypass your problem by using localId/globalId (thread local to block Id or thread global Id) or groupId (block Id)? On older OpenCL versions there was not: https://stackoverflow.com/questions/19547197/how-to-get-compute-unit-id-at-runtime-in-opencl , but this is old — tryman, Feb 15 '19 at 01:32
There isn't in standard OpenCL, so if anything this would be an implementation-specific extension - consult the vendor documentation. — pmdj, Feb 15 '19 at 13:13
@tryman Thank you. We want to do a GPU micro-architecture reliability study. So no, using local/group/global IDs are too abstract for our case, we want to localize the computation on the actual hardware components. Such as, which compute unit is currently is in use. Yes, I saw that post before but that was very outdated, so I wanted to see if they introduced something like [finding SM id](https://devtalk.nvidia.com/default/topic/481465/cuda-programming-and-performance/any-way-to-know-on-which-sm-a-thread-is-running-/) in Cuda and Nvidia. — mjd, Feb 15 '19 at 18:57
There seems to be an [OpenCL extension for arm processors](https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_get_core_id.txt) supporting this feature so that probably means there isn't support in the standard OpenCL specification up to 1.2. — tryman, Feb 21 '19 at 18:47
I'm not sure if you can get the device-id inside your kernel. If you can, then you could try partitioning your root device (eg. your GPU or your CPU) to the maximum sub-devices possible (equal to max compute units) and then query your device id (which should give you the sub-device id). — tryman, Feb 21 '19 at 19:25
@tryman I see, thank you for your comments! So, ARM extension has it. But, why for GPU querying the device id should give me the device id (compute unit id)? Also, "which" compute unit that id would correspond to? — mjd, Feb 22 '19 at 14:58
If you check the latest OpenCL specification, which I was having in mind when posting the previous comment, there is a way to partition your root device to subdevices, each subdevice handling a subset of the compute units. If you have a 1-1 correspondence between a subdevice and a compute unit then their ids would be effectively the same. — tryman, Feb 22 '19 at 15:15
If you mean which as in which geographically on the GPU I don't know, but even Cuda with Inline PTX can't tell you that. If you mean to which "original" id does this subdevice id correspond to then I have no idea but I doubt that would be a concern. From what I understand you just need different ids for different compute units, right? — tryman, Feb 22 '19 at 15:21
In any case, I don't even know if this would work that's why I'm not posting this as an answer. It's just an idea of where I would head to to tackle this :) Also keep in mind that I am referring to the latest OpenCL specification. This means I don't know if this capability exists in earlier OpenCL versions or if it exists if it works the same way. — tryman, Feb 22 '19 at 15:23
@tryman Yes, I need unique ids for different physical compute units. — mjd, Feb 26 '19 at 00:06

Finding the compute unit id when launching a kernel on an AMD GPU

0 Answers0