I'm trying to get a handle on SYCL's kernel_bundle functionality. My target is to use it as a db storage for my AOT precompiled custom kernels. The desirable result is to precompile these kernels into a stand alone library for other libraries to query for a specific implementations at runtime, and pass it around in an ABI-safe way across different compilation units.
I've hit a couple of roadblocks:
- It seems the constructors for both sycl::kernel and sycl::kernel_bundle are either marked deleted or private.
- I'm a bit muddled about the proper workflow: How do I instantiate a kernel, bundle it, build this bundle, fetch the kernel using its ID, and then execute it with specialization constants?
I looked into the following specification https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html but it seems like some of the examples miss the point of using a kernel_bundle, for instance examples in 4.11.15.1 and 4.11.15.2 . They seem to get a kernel_bundel but then proceed and execute another inline implementation of kernels.
Would appreciate if someone could shed light on this or share a comprehensive example illustrating the AOT compilation and runtime specialization process using kernel_bundle with custom kernels in SYCL.
Thanks in advance!
I was expecting something along these lines:
class AddKernel {
public:
AddKernel(sycl::global_ptr<int> a, sycl::global_ptr<int> b, sycl::global_ptr<int> c)
: a_(a), b_(b), c_(c) {}
void operator()(sycl::id<1> idx) {
c_[idx] = a_[idx] + b_[idx];
}
private:
sycl::global_ptr<int> a_;
sycl::global_ptr<int> b_;
sycl::global_ptr<int> c_;
};
int main() {
sycl::queue q;
// Create data for the kernel
std::vector<int> a = {1, 2, 3, 4, 5};
std::vector<int> b = {5, 4, 3, 2, 1};
std::vector<int> c(5);
// Create buffers for the data
sycl::buffer<int> buf_a(a.data(), a.size());
sycl::buffer<int> buf_b(b.data(), b.size());
sycl::buffer<int> buf_c(c.data(), c.size());
// 1st approach:
sycl::kernel add(AddKernel);
auto bundle = add.get_kernel_bundle();
//2nd approach
auto bundle = sycl::get_kernel_bundle<sycl::bundle_state::input>
(q.get_context(), {sycl::get_kernel_id<AddKernel>()});
auto built = sycl::build(bundle);
sycl::use_kernel_bundle(built);
auto AddKernel = bundle.get_kernel_ids("add");
// Submit the kernel for execution using the kernel_bundle
q.submit([&](sycl::handler& h) {
auto acc_a = buf_a.get_access<sycl::access::mode::read>(h);
auto acc_b = buf_b.get_access<sycl::access::mode::read>(h);
auto acc_c = buf_c.get_access<sycl::access::mode::write>(h);
// Create an instance of the AddKernel functor with the necessary data
auto addKernel AddKernel(acc_a.get_pointer(), acc_b.get_pointer(), acc_c.get_pointer());
// Execute the kernel using the kernel_bundle
//h.parallel_for<AddKernel>(sycl::range<1>(a.size()), addKernel);
});
// Wait for the kernel execution to complete
q.wait();
// Print the results
for (int i = 0; i < c.size(); i++) {
std::cout << a[i] << " + " << b[i] << " = " << c[i] << std::endl;
}
return 0;
}