0

In an existing project I've started to help work on that has worked fine in the past, we're using posix shared memory for ipc between python and c++ code for Unreal Engine 4.27. Recently the code started having issues, and it seems to have started within the month, but could have been going on longer and just been hidden because of other issues we had to resolve with our CI implementation (nvidia driver/docker image issues that have been resolved). Randomly when running through our tests one specific line of code will cause a SIGBUS error. Line of code, and related code below, removing some log statements.

First snippet, where error raises:

boolPtr = static_cast<bool*>(Server->Malloc(TCHAR_TO_UTF8(*NAME), 1 * sizeof(bool)));
        if (ShouldReadBufferPtr != nullptr){
            *ShouldReadBufferPtr = false; #where error gets raised
        }

second snippet, malloc definition

void* UServer::Malloc(const std::string& Key, unsigned int Size) {
    // If this key doesn't already exist, or the buffer size has changed, allocate the memory.
    if (!Memory.count(Key) || Memory[Key]->Size() != Size) {
        Memory[Key] = std::unique_ptr<SharedMemory>(new SharedMemory(Key, Size, TCHAR_TO_UTF8(*UUID)));
    }
    return Memory[Key]->GetPtr();
}

snippet of shared memory implementation:


    MemFile = shm_open(MemPath.c_str(), O_CREAT | O_RDWR, 0777);
    if (MemFile == -1) {
        LogSystemError("Unable to create shared memory buffer");
    }

    int status = ftruncate(MemFile, this->MemSize);
    if (status == -1) {
        LogSystemError("Failed to truncate file");
    }

    MemPointer = static_cast<void*>(mmap(nullptr, this->MemSize, PROT_READ | PROT_WRITE,
                                         MAP_SHARED, MemFile, 0));
    if (MemPointer == MAP_FAILED) {
        LogSystemError("Failed to map shared memory");
    }

    // Doesn't need to stay open
    close(MemFile);

Now, I've been trying to figure this out, and have seen that posix shared memory works best if the memory size is in multiples of a page, and tried making everything round up to that and it didn't fix any error. Can't get GDB to step into the c++, to step through or watch the memory address.

This error seems to occur only on this line, it doesn't happen every time, but on the computers I've tested it on, it seems to happen consistently at the same memory address across repeated tests on the same computer.

I've tried it on several computers and it seems to come up on any Ubuntu 22.04 devices I try it on, however it also fails on some Ubuntu 20.04 computers, but not always. I don't know why it'd be related and affect this, but when the computers are Ubuntu 20.04, and have nvidia driver 525 it seems to work, but 535 does not.

0 Answers0