0

My app is crashing because of "Cannot close an uninitialised Msg" unhandled exception. This is probably due to access to socket from multiple threads.

And I have problem debugging this issue because when I review my code all access to socket is done in poller thread -- either in ReceiveReady event handler directly (which is run on poller thread by definition as I understand it) or in manually created Task (new Task(...)) and then started on poller thread (task.Start(poller)). So I don't see a place where it could happen.

Second problem is it is unhandled exception -- I wrap all sending/receiving in try-catch, yet the exception happen somewhere outside.

I am looking for ways how to effectively debug it and pinpoint the place in my code which misbehaves.


Code examples -- as I wrote I use only two "patterns":

Using poller thread directly (thanks to events fired on poller's thread):

 private async void OnMessageReceiveReady(object sender, NetMQSocketEventArgs args)
 {
     NetMQSocket socket = args.Socket;

     NetMQMessage mq_msg = socket.ReceiveMultipartMessage();
     ...

Switching to poller's thread from arbitrary thread:

Task sending = new Task(() =>
{
    foreach (NetMQFrame address in mq_envelope)
        socket.SendMoreFrame(address.ConvertToString());

    socket.SendFrame(response_data);
});
sending.Start(this.sharedPoller);
await sending.ConfigureAwait(false);
astrowalker
  • 3,123
  • 3
  • 21
  • 40
  • Can you show us some of the relevant code (as much as you can)? – mjwills Jul 26 '17 at 10:40
  • Is the code locked where you are reading/writing received data? The ReceiveReady event handler should do very little with data. It should read the data and put into a buffer and continue. The processing of the buffer should be handled in a different thread or main code. – jdweng Jul 26 '17 at 10:48
  • @mjwills, I updated the question -- I use this code multiple times, that's all. – astrowalker Jul 26 '17 at 10:50
  • @jdweng, you mean like `lock` command? No, events are fired on poller's thread, so there could be only one running at given time, correct? Thank you for the tip, but it goes rather into performance issue, not multithreaded execution. Is it possible to get such exception because `ReceiveReady` code takes a bit more time? – astrowalker Jul 26 '17 at 10:52
  • 1
    TCP datagrams are max size of ~1500 bytes.You can get datagrams of zeroes bytes(keep-alive message),and datagrams can be splits into small datagrams and combined by routers and servers. So the data sent isn't always going to be received.so the receiving parser has to be smart enough to handle fractional messages.My ReceiveReady method I just add need data to a List object.Then have a parsere then removes one message at a time from the List.Ascii data my parser will read from List until it finds first returned.Then removes the line in a locked code so RxReady is locked. – jdweng Jul 26 '17 at 11:14
  • @jdweng, for the first time I hear that ZMQ can deliver fraction of the message, AFAIK it is designed with principle "everything or nothing" (the drop can occur on some sockets due to reaching HWM but this is different story). This is side note remark anyway -- I don't see how this issue is relevant to debugging the exception I wrote about. – astrowalker Jul 26 '17 at 11:31
  • It is in the RFC specification for TCP.My response wasn't directly related to debugging, but if the architecture of the code wasn't designed to handle all possible situations then the code needs to be redesigned.Microsoft implementation of events use timers to transfer data between buffers and these timers are not synchronous to the received tcp data . With multi-hop connections messages can take different routes. Firewalls use port forwarding which also isn't synchronized with the datagrams. Implementation of datagrams can vary between vendors so the 1504 bytes can be 1500 bytes + 4 bytes. – jdweng Jul 26 '17 at 11:59

1 Answers1

0

Unfortunately I didn't find any other method than trial&error and more logging.

And the problem was with disposing sockets -- I have running poller (shared) and I tried to Remove and Dispose socket, however I found out that those two methods are asynchronous.

As solution I group Remove and Dispose together in separate task and then schedule it to run in poller. Having task in hand I can call Wait on it and this way I achieve blocking, synchronous behaviour in my Dispose.

astrowalker
  • 3,123
  • 3
  • 21
  • 40