OpenCL on MacOS: SIGABRT in release build, EXC_BAD_INSTRUCTION in libdispatch in debug build when using AMD Radeon 555 as CL device

Question

I'm encountering a hard to track down bug on MacOS in an OpenCL-based application. In a release build my code crashes with a SIGABRT at some point, in a release build I get an EXC_BAD_INSTRUCTION on a thread obviously managing some lib dispatch / GCD stuff (com.apple.libdispatch-manager). Note that I do not call any GCD related things myself, so I assume this is done by the Apple OpenCL runtime in the background.

The context is a benchmarking application that measures latency between enqueuing CL commands and receiving the CL_COMPLETE callback for various ways of accessing the CL buffers. You'll find the code below. The error only occurs for one of the three available CL Devices in my MacBook Pro (AMD Radeon Pro 555 Compute Engine).

Relevant part of the code:

nlohmann::json performTestUseHostPtr()
{
    nlohmann::json results;

    std::vector<cl::Event> inputBufferEvent  (1);
    std::vector<cl::Event> outputBufferEvent (1);
    std::vector<cl::Event> kernelEvent       (1);

    for (auto size : testSizes)
    {
        std::vector<float> inputBufferHost  (size);
        std::vector<float> outputBufferHost (size);

        cl::Buffer inputBuffer  (context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY,  size * sizeof (float), inputBufferHost.data());
        cl::Buffer outputBuffer (context, CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, size * sizeof (float), outputBufferHost.data());

        void* inputBufferMapped = queue.enqueueMapBuffer (inputBuffer, CL_TRUE, CL_MAP_WRITE_INVALIDATE_REGION, 0, size * sizeof (float));
        std::memcpy (inputBufferMapped, testData.data(), size * sizeof (float));

        kernel.setArg (0, inputBuffer);
        kernel.setArg (1, outputBuffer);

        for (int i = 0; i < numTests; ++i)
        {
            startTimes[i] = my::HighResolutionTimer::now();

            queue.enqueueUnmapMemObject (inputBuffer, inputBufferMapped, nullptr, &inputBufferEvent[0]);
            inputBufferEvent[0].setCallback (CL_COMPLETE, setTimestampCallback, &unmapCompletedTimes[i]);

            queue.enqueueNDRangeKernel (kernel, cl::NullRange, cl::NDRange (size), cl::NullRange, &inputBufferEvent, &kernelEvent[0]);
            kernelEvent[0].setCallback (CL_COMPLETE, setTimestampCallback, &kernelCompletedTimes[i]);

            void* outputBufferMapped = queue.enqueueMapBuffer (outputBuffer, CL_FALSE, CL_MAP_READ, 0, size * sizeof (float), &kernelEvent, &outputBufferEvent[0]);
            outputBufferEvent[0].setCallback (CL_COMPLETE, setTimestampCallback, &mapCompletedTimes[i]);

            inputBufferMapped = queue.enqueueMapBuffer (inputBuffer, CL_TRUE, CL_MAP_WRITE_INVALIDATE_REGION, 0, size * sizeof (float), &kernelEvent, nullptr);

            // --- Release build error seems to happen somewhere here ---

            queue.finish();

            std::memcpy (inputBufferMapped, outputBufferMapped, size * sizeof (float));

            queue.enqueueUnmapMemObject (outputBuffer, outputBufferMapped);
            queue.finish();
        }

        queue.enqueueUnmapMemObject (inputBuffer, inputBufferMapped);

        results["vecSize=" + std::to_string (size)] = calculateTimes();

        queue.finish();
    }

    return results;
}

Notes:

I checked the error codes of all CL calls, all return CL_SUCCESS, just removed them in the code above for a better overview. I marked the line where I roughly assume the error to happen, this is based on inserting print-statements in the release-version and watching which points of the code were completed before the fault occurs. Inserting a print statement above the queue.finish(); statement furthermore lets the bug disappear, so this is likely to be something timing related.

Update:

When inserting a short sleep in the line where I assumed the error to happen and running a debug build it now also triggers a SIGABRT. Additionally I can find the following prints on the console:

OpenCLLatencyTests(17903,0x10012a5c0) malloc: tiny_free_list_remove_ptr: Internal invariant broken (next ptr of prev): ptr=0x1003052d0, prev_next=0x0
OpenCLLatencyTests(17903,0x10012a5c0) malloc: *** set a breakpoint in malloc_error_break to debug
Signal: SIGABRT (signal SIGABRT)
E0412 11:55:02.898913 233472000 ProtobufClient.cpp:63] No such process

Question:

Can anyone spot an obvious error in my code?
If not, are there any known bugs in the Apple OpenCL implementation that could cause errors like that?

Not sure about known bugs in OpenCL, but note that Apple has deprecated OpenGL and OpenCL (sadly), so don't expect any sort of help from their side. At least their OpenGL implementation is extremely outdated (and buggy). — Acorn, Apr 12 '19 at 10:04
I know that they deprecated it and I don't expect any help from the Apple side. My work is not focused on Apple only, the main application I'm working for is indeed Linux-based, however I'm working on Mac OS most of the time, so I'd love to be able to continue most of my work in my native development environment until some final point. This means: If it would be a known bug, I wouldn't put any effort in a workaround, however if my code has errors I'm obviously interested in fixing it! — PluginPenguin, Apr 12 '19 at 10:10
In that case, it seems to me that the best approach is that you test your code on Linux. If the bug appears there, then you at least know the bug is likely on your side. — Acorn, Apr 12 '19 at 10:11

OpenCL on MacOS: SIGABRT in release build, EXC_BAD_INSTRUCTION in libdispatch in debug build when using AMD Radeon 555 as CL device

0 Answers0