I have a problem with my code trying to utilize the opencl capabilities of my gpu. Especially I am developing this project: https://github.com/alekstheod/tnnlib
The openCL related code is located here: https://github.com/alekstheod/tnnlib/tree/master/NeuralNetwork/NeuralLayer/OpenCL
Basically the interesting part is here:
void calculate() {
try {
using namespace cl;
auto& ocl = OpenCLProgram::instance();
const auto& defaultDevice = ocl.devices.front();
// Create a command queue and use the first device
const cl_mem_flags inBufFlags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
const cl_mem_flags outBufFlags = CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR;
Buffer weights(ocl.context,
inBufFlags,
bufferSize * sizeof(float),
m_weights.data());
Buffer values(ocl.context,
inBufFlags,
bufferSize * sizeof(float),
m_inputs.data());
Buffer product(ocl.context,
outBufFlags,
size() * sizeof(float),
m_dotProducts.data());
CommandQueue queue(ocl.context, defaultDevice);
cl::Kernel kernel{ocl.program, "dot_product"};
// Set arguments to kernel
kernel.setArg(0, weights);
kernel.setArg(1, values);
kernel.setArg(2, product);
kernel.setArg(3, static_cast< unsigned int >(Internal::inputs()));
queue.enqueueNDRangeKernel(kernel,
cl::NullRange,
cl::NDRange(size()),
cl::NullRange);
queue.enqueueReadBuffer(product,
CL_TRUE,
0,
m_dotProducts.size() * sizeof(float),
m_dotProducts.data());
auto& self = *this;
for(const auto i : ranges::views::indices(size())) {
m_dotProducts[i] += self[i].getBias();
}
for(const auto i : ranges::views::indices(size())) {
auto& neuron = self[i];
neuron.calculateOutput(m_dotProducts[i],
m_dotProducts.begin(),
m_dotProducts.end());
}
} catch(const cl::Error& e) {
std::cerr << "Calculation error" << std::endl;
}
}
What is wrong with my workflow here? If I change the CL_MEM_USE_HOST_PTR to something like CL_MEM_COPY_HOST_PTR it crashes my GPU. It does run for several cycles but then the whole thing is crashing. Can some openCL expect help me with that?