Using SYCL to run code on any OpenCL device doesn't require a custom compiler, as everything is done in a library (full of template magic), and a standard GCC/Clang will do just fine. Is this correct? (Especially in the case of triSYCL, which I'm using...)
If so... I know that simple expression trees can be extracted by overloading a bunch of operators on custom "handle" or "wrapper" classes, but this is not the case with control flow. Am I wrong?
Section 3.1 of this paper discusses the pros and cons of a few different approaches to adding EDSLs to C++, but I'm more interested in the actual technical implementation of the method SYCL uses.
I tried to look at the source at some SYCL-related projects (Eigen, TensorFlow, triSYCL, ComputeCpp, etc.) but so far I could not find the answer in them.
So: How can a SYCL library(?) discover the full control flow graph of a kernel, given as an ordinary C++ lambda, without needing a custom/extended compiler?