The nvvp CUDA profiler frontend offers an analysis breaking down the causes for warps waiting for execution of their next instruction. We have categories such as "Execution latency", "Memory dependency", "Texture dependency", etc. - and one category named "Other":
The pie chart's legend says:
Other - "The kernel was blocked for a[n] uncommon reason"
My questions:
- Does that mean the profiler can't figure out why execution was blocked, or is it just aggregating "uncommon reasons"?
- What are the more "common uncommon" reasons? As you can see, in some cases they can be far from negligible.
- Is there a list of all "uncommon reasons" somewhere?