2

in a research project we are developing a special-purpose floating-point accelerator. In this context, our original vision was to have a kind of "two-stage" or "nested" offload from ARM host -> RISCV-managed accelerator cluster -> actual floating-point accelerator.

So we wanted to aim for something like the following code:

// start on ARM host
#pragma omp target
{
  // we are on RISCV
  #pragma omp target
  {
    // we are on the floating-point accelerator
    ...do math
  }
}

In the newest OpenMP 5.2 API specification, I found under "13.8 target Construct -> Restrictions" the paragraph

"Device-affecting constructs, other than target constructs for which the ancestor device-modifier is specified, must not be encountered during execution of a target region."

This is new in OpenMP 5.2 as far as I see, and seems to explicitly forbid a concept of nested offloading. It would be very interesting for us to understand

-if we understand this correctly, i.e., nested offload is now explicitly prohibited instead of only being "unspecified" as in the previous 5.1 API specification.

-if nested offloading is prohibited, what were the design decisions leading to this by the ARB?

-if there is a possibility to include a kind of nested offload specification in future API specifications of OpenMP to support scenarios of nested offloading between heterogeneous accelerators?

I would appreciate an answer a lot!

Best regards,

Kai Plociennik

1 Answers1

4

This requires a bit of context to answer correctly. In the OpenMP specification, when we have a rule like that, it means the result is "unspecified behavior." The normally expected behavior in a case like this is that an implementation will issue a diagnostic, hence the "must", but if a given vendor decides that the behavior they wish to provide is that the nested constructs work, that is perfectly ok as far as the specification is concerned. It just means that the program is not going to be portable to other implementations.

With that bit of context then:

-if we understand this correctly, i.e., nested offload is now explicitly prohibited instead of only being "unspecified" as in the previous 5.1 API specification.

The target construct has never been fully supported in the target region. The original restriction that you elude to is this in 4.0:

If a target, target update, or target data construct appears within a target
region then the behavior is unspecified.

The effect of both of these is the same.

-if nested offloading is prohibited, what were the design decisions leading to this by the ARB?

It is less prohibited than it used to be, in that we support the device(ancestor) clause to allow reverse offload. The intention is to investigate supporting full nested offload, but there are substantial complexities in making that portable. Some implementations can support offloading to the same device synchronously, some cannot, others can support offloading to sibling devices easily, others can't. As a result, for now, we continue to leave it as unsupported as far as the specification is concerned.

-if there is a possibility to include a kind of nested offload specification in future API specifications of OpenMP to support scenarios of nested offloading between heterogeneous accelerators?

There is definitely a possibility, in fact we already have a minimal version of this, and I can think of at least one research implementation that does it as well. The real barrier is getting an interface to it that's portable. The smallest change likely to make this acceptable would be adding something like a requires clause for support for nested offload, but even there we have to answer quite a few questions about what the behavior would be.

If you are interested in building support like this, we are always looking for motivated people to push efforts like this. Your participation would be appreciated, feel free to PM me if you want more information.

Tom Scogland
  • 937
  • 5
  • 12