in a research project we are developing a special-purpose floating-point accelerator. In this context, our original vision was to have a kind of "two-stage" or "nested" offload from ARM host -> RISCV-managed accelerator cluster -> actual floating-point accelerator.
So we wanted to aim for something like the following code:
// start on ARM host
#pragma omp target
{
// we are on RISCV
#pragma omp target
{
// we are on the floating-point accelerator
...do math
}
}
In the newest OpenMP 5.2 API specification, I found under "13.8 target Construct -> Restrictions" the paragraph
"Device-affecting constructs, other than target constructs for which the ancestor device-modifier is specified, must not be encountered during execution of a target region."
This is new in OpenMP 5.2 as far as I see, and seems to explicitly forbid a concept of nested offloading. It would be very interesting for us to understand
-if we understand this correctly, i.e., nested offload is now explicitly prohibited instead of only being "unspecified" as in the previous 5.1 API specification.
-if nested offloading is prohibited, what were the design decisions leading to this by the ARB?
-if there is a possibility to include a kind of nested offload specification in future API specifications of OpenMP to support scenarios of nested offloading between heterogeneous accelerators?
I would appreciate an answer a lot!
Best regards,
Kai Plociennik