0

I am running a UDP socket from a c/cpp library and passing in a callback from python.

The callback runs fine, until I attempt to modify a member variable of the python application. When I do attempt to modify the member variable, I receive segfault 11 after arbitrary amount of time.

I am curious if this means I will need to handle GIL by wrapping my callback call in py_BEGIN_ALLOW_THREADS and py_END_ALLOW_THREADS: https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock

if possible I would like to avoid including <Python.h> as this is an abstracted library intended to also be compatible with .net

.cpp callback definition

#ifdef _WIN32
typedef void(__stdcall* UDPReceive)(const char* str);
#else
typedef void (*UDPReceive)(const char* str);
#endif

.cpp thread launch

ReceiveThread = std::async(std::launch::async, &MLFUDP::ReceivePoller, this, callback);

.h ReceiveCallback

UDPReceive ReceiveCallback = nullptr;

.cpp recieve thread that triggers python callback

void UDP::ReceivePoller(UDPReceive callback)
{
    ReceiveCallback = callback
    ReceiverRunning = true;

    UDPLock *receiveLock = new UDPLock();

#ifdef _WIN32
    int socketLength = sizeof(ClientAddr);
    int flags = 0;
#else
    socklen_t socketLength = sizeof(ClientAddr);
    int flags = MSG_WAITALL;
#endif

    int result;
    char buffer[MAXLINE];
    while(ReceiverRunning)
    {
        try {
            memset(buffer,'\0', MAXLINE);
            result = recvfrom(RecvSocketDescriptor,
                              (char*)buffer,
                              MAXLINE,
                              flags,
                              (struct sockaddr*)&ClientAddr,
                              &socketLength);
#ifdef _WIN32
            if (result == SOCKET_ERROR)
            {
                Log::LogErr("UDP Received error: " + std::to_string(WSAGetLastError()));
            }
#else
            if(result < 0)
            {
                Log::LogErr("UDD Received error: " + std::to_string(result));
            }
#endif
            buffer[result] = '\0';

#ifdef _WIN32
            char* data = _strdup(buffer);
#else
            char* data = strdup(buffer);
#endif
            //handle overlfow
            if(data == nullptr) {continue;}
            receiveLock->Lock();
            //Fire Callback
            ReceiveCallback(data); 
            receiveLock->Unlock();

        }
        catch(...)
        {
            //okay, we want graceful exit when killing socket on close
        }
    }

}

**.py lib initialization **

    def __init__(self, udp_recv_port, udp_send_port):
        libname = ""
        if platform == "win32":
            print("On Windows")
            libname = pathlib.Path(__file__).resolve().parent / "SDK_WIN.dll"
        elif platform == "darwin":
            print("on Mac")
            libname = pathlib.Path(__file__).resolve().parent / "SDK.dylib"
            print(libname)
        elif platform == "linux":
            print("on linux")

        UDP_TYPE_BIND = 0

        #Load dynamic library
        self.sdk = CDLL(str(libname))

        callback_type = CFUNCTYPE(None, c_char_p)
        log_callback = callback_type(sdk_log_function)
        self.sdk.InitLogging(2, log_callback)

        recv_callback = callback_type(self.sdk_recv_callback)
        self.sdk.InitUDP(udp_recv_port, udp_send_port, UDP_TYPE_BIND, recv_callback)

.py recv_callback definition If I run this callback everything works fine, have spammed it with a few million messages

    @staticmethod
    def sdk_recv_callback(message):
        print(message.decode('utf-8'))
        string_data = str(message.decode('utf-8'));
        if len(string_data) < 1:
            print("Returning")
            return

Yet if I then add this message to a thread safe FIFO queue.Queue() I receive segfault 11 after an arbitrary (short) amount of time while receiving messages

 @staticmethod
    def sdk_recv_callback(message):
        print(message.decode('utf-8'))
        string_data = str(message.decode('utf-8'));
        if len(string_data) < 1:
            print("Returning")
            return

        message_queue.put(string_data)

.py poller function ingesting message queue

    def process_messages(self):
        while self.is_running:
            string_message = message_queue.get();
            data = json.loads(string_message);
            print(data)

Most of this I am learning as I go (in a silo), so I think there is a large chance I am possibly missing something basic/fundamental. I would greatly appreciate any advice on better approaches or just another set of eyes. Thank you.

this is currently being compiled on macos with cmake on an m1 chip.

Lucas Moskun
  • 172
  • 1
  • 13

4 Answers4

1

It turns out I did not need to use python.h in my c library to handle GIL. Since I am using ctypes, it "magically" handles GIL by spinning up a temp python thread each time the callback is called (which is nicely detailed here)

This seg fault was because of the process_message function, which I am running from a thread. The seg fault was caused because I initialized the ctypes library from inside a class. Instead I init the SDK on main and passed a reference to the class

if __name__ == "__main__":

faulthandler.enable()
libname = ""
if platform == "win32":
    print("On Windows")
    libname = pathlib.Path(__file__).resolve().parent / "SDK_WIN.dll"
elif platform == "darwin":
    print("on Mac")
    libname = pathlib.Path(__file__).resolve().parent / "MLFSDK.dylib"
    print(libname)
elif platform == "linux":
    print("on linux")

sdk = CDLL(str(libname))

app = the_app(sdk,6666,7777)

After this all of the threads played along.

Lucas Moskun
  • 172
  • 1
  • 13
0

hhmmm. Complicated one this one. Only thing i can think of is a buffer overflow in the UDP::ReceivePoller function. You declare a char * with char buffer[MAXLINE];. Say MAXLINE is = 1024 for example. So buffer will be char * to a bank of memory with 1024. Fine. Then you memset buffer to \0 for 1024 bytes. Fine. Then you do

result = recvfrom(RecvSocketDescriptor,
                  (char*)buffer,
                  MAXLINE,
                  flags,
                  (struct sockaddr*)&ClientAddr,
                  &socketLength);

Which in theory can read 1024 max bytes from the socket. returning 1024 into result and buffer set to the 1024 bytes read. then you set buffer[result] = '\0'; setting index 1024 of buffer to null. However, Index's are from 0 not 1. So that sets a byte that is 1 byte after the 1024 reserved to '\0'. And I guess is fine (since its only 1 byte off) for a little bit. Eventually buffer gets put next to something its not supposed to access somewhere in memory and it segs. So my GUESS is to either:

a) update the recvfrom(...) to only ready MAXLINE - 1 bytes. This way you only ready 1023 bytes from the socket. Leaving the 1024'th byte in the buffer for null.

or

b) update the buffer to be char buffer[MAXLINE + 1]; to give the 1 byte extra... (remember to update the memset aswell to MAXLINE+1)

testfile
  • 2,145
  • 1
  • 12
  • 31
  • Thank you for taking a look. I gave this a shot (and leaving it in, because seems like on obvious error). However it did not solve the solution. If I just print(message) from the callback I have been able to receive 2 million+ messages on the socket, seg fault only rears its head when I start to add complexity/functions from imported libraries, or even just simple json decode. – Lucas Moskun Apr 23 '22 at 18:01
0

Part of the problem might be that since you are getting a seg fault, it's hard to get information about where the error occurs.

You may want to import faulthandler and call faulthander.enable() at the start of your program (see https://docs.python.org/3/library/faulthandler.html#faulthandler.enable). Using faulthandler can provide some minimal stack trace information on a seg fault and help you find the problem.

oxer
  • 975
  • 1
  • 10
  • 16
-1

From past experience with cython's with gil construct you'll need to acquire the GIL when calling back into python.

From python doc's it sounds like you'll need to call the PyGILState_Ensure() to acquire the GIL and PyGILState_Release() to release the GIL.

Mark Mann
  • 56
  • 4