0

I have a problem with my code trying to utilize the opencl capabilities of my gpu. Especially I am developing this project: https://github.com/alekstheod/tnnlib

The openCL related code is located here: https://github.com/alekstheod/tnnlib/tree/master/NeuralNetwork/NeuralLayer/OpenCL

Basically the interesting part is here:

            void calculate() {
            try {
                using namespace cl;
                auto& ocl = OpenCLProgram::instance();
                const auto& defaultDevice = ocl.devices.front();

                // Create a command queue and use the first device

                const cl_mem_flags inBufFlags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
                const cl_mem_flags outBufFlags = CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR;

                Buffer weights(ocl.context,
                               inBufFlags,
                               bufferSize * sizeof(float),
                               m_weights.data());

                Buffer values(ocl.context,
                              inBufFlags,
                              bufferSize * sizeof(float),
                              m_inputs.data());

                Buffer product(ocl.context,
                               outBufFlags,
                               size() * sizeof(float),
                               m_dotProducts.data());

                CommandQueue queue(ocl.context, defaultDevice);
                cl::Kernel kernel{ocl.program, "dot_product"};

                // Set arguments to kernel
                kernel.setArg(0, weights);
                kernel.setArg(1, values);
                kernel.setArg(2, product);
                kernel.setArg(3, static_cast< unsigned int >(Internal::inputs()));

                queue.enqueueNDRangeKernel(kernel,
                                           cl::NullRange,
                                           cl::NDRange(size()),
                                           cl::NullRange);

                queue.enqueueReadBuffer(product,
                                        CL_TRUE,
                                        0,
                                        m_dotProducts.size() * sizeof(float),
                                        m_dotProducts.data());

                auto& self = *this;
                for(const auto i : ranges::views::indices(size())) {
                    m_dotProducts[i] += self[i].getBias();
                }

                for(const auto i : ranges::views::indices(size())) {
                    auto& neuron = self[i];
                    neuron.calculateOutput(m_dotProducts[i],
                                           m_dotProducts.begin(),
                                           m_dotProducts.end());
                }
            } catch(const cl::Error& e) {
                std::cerr << "Calculation error" << std::endl;
            }
        }

What is wrong with my workflow here? If I change the CL_MEM_USE_HOST_PTR to something like CL_MEM_COPY_HOST_PTR it crashes my GPU. It does run for several cycles but then the whole thing is crashing. Can some openCL expect help me with that?

AlexTheo
  • 4,004
  • 1
  • 21
  • 35
  • Debugging OpenCL code can feel like telling someone by phone how to debug their program. The usual approach I use is first confirm OpenCL is working by running a program I know works, like clinfo. If that works okay, then check all the error codes returned by the API calls. That usually helps to narrow down the fault. Does your program work with USE_HOST? If so, you might just not have enough RAM available to use COPY_HOST_PTR as it would try to create another copy of the buffers on your host. Try a less memory demanding run if you can. – Simon Goater Mar 26 '23 at 22:02
  • That seems to me as some problem accessing unallocated memory, check that bufferSize >= size()*Internal::inputs(). I would also not use these functions but instead int sz = inputs(); int n = size(); if(sz*n>bufferSize) raise "I am accessing memory, which I did not allocated". – VojtaK Apr 24 '23 at 15:06

0 Answers0