2

I'm debugging a strange problem, happening on one of the machines in the live environment.

My app (slave) is supposed to be receiving UDP multicast messages at any time from another host (master) in the LAN, but apparently it does so only if the slave has previously sent a message.

What I expected is:

  1. Slave asks for data
  2. Master sends the data
  3. Slave receives and consumes
  4. Master waits 2-3 minutes
  5. Master sends new data
  6. Slave receives and consumes the new data
  7. Steps from 4 to 6 are repeated

What I see is:

  1. Slave doesn't receive anything

BUT if I make the slave asks for new data continuously (polling, i.e. repeat step 1) I finally get the message.

I see in Wireshark that the message from the master is indeed received by the slave host. Just my app is not receiving it. What is more surprising, is that another master-slave pair running on the same network, with the same apps, is working fine, as well as my pair in the test environment.

The slave app uses UdpClient in asynchronous mode. Here is how the listener is initialized:

private void ListenMain()
{
    try
    {
        UdpClient udpClient = new UdpClient();
        udpClient.Client.ExclusiveAddressUse = false;
        udpClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
        udpClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReceiveTimeout, 1000);
        IPv4InterfaceProperties p = adapter.GetIPProperties().GetIPv4Properties();

        udpClient.Client.SetSocketOption(SocketOptionLevel.IP, SocketOptionName.MulticastInterface, (int)IPAddress.HostToNetworkOrder(p.Index));

        udpClient.Client.Bind(endPoint);
        udpClient.JoinMulticastGroup(12345);

        ListenState listenState = new ListenState();
        listenState.udpClient = udpClient;
        listenStates.Add(listenState);
        logger.Debug("Waiting for messages");
        udpClient.BeginReceive(new AsyncCallback(OnPacketReceived), listenState);
    }
    catch (Exception e)
    {
        logger.Error(e, "ListenMain() encountered an error");
    }
}

And here is the handler of a received packet:

private void OnPacketReceived(IAsyncResult result)
{
    logger.Trace("OnPacketReceived");
    IPEndPoint recvAddress = new IPEndPoint(IPAddress.Any, MULTICAST_PORT);
    ListenState state = result.AsyncState as ListenState;
    byte[] receive_byte_array;
    try
    {
        logger.Trace("before EndReceive");
        receive_byte_array = state.udpClient.EndReceive(result, ref recvAddress);
        logger.Trace("after EndReceive, got {0} bytes", receive_byte_array.Length);

        // packet handling goes here...

        // do the loop
        logger.Trace("waiting for another packet");
        state.udpClient.BeginReceive(new AsyncCallback(OnPacketReceived), state);
    }
    catch (ObjectDisposedException)
    {
        logger.Info("Socket is now closed");
        return;
    }
    catch (Exception e)
    {
        logger.Warn(e, "exception in handling incoming message");
    }
}

Of course, polling for new data is not an optimal solution and introduces unwanted delays. I'd like to know which phenomenon makes UdpClient lose incoming packets unless the same UdpClient has sent something before.

ris8_allo_zen0
  • 1,537
  • 1
  • 15
  • 35

1 Answers1

0

I think there is an error in the code: udpClient.JoinMulticastGroup(); takes the multicast IP address as argument, not the port. Does it work when you fix this? If so, this explains it:

Not joining a multicast group leads to the typical "multicast group not joined" erratic behavior, which includes the favorite "it works for two to five minutes and then suddenly stops" and "it works when I send something in the other direction and then suddenly stops" and "it works when using a different multicast address and then stops, leaving unusable multicast addresses behind".

The behaviour you see is typical for IPv4 multicast with more or less intelligent routers and switches. They all support some version of IGMP snooping (with timeouts, bugs and incompatible versions), and routers, switches and OSes cache network paths and MACs and registered and unregistered multicast IPs for an undefined amount of time. This makes it impossible to reason about the behaviour in a logical way.

Check whether you joined the expected multicast group on the receiver/listener. When this looks ok and you still have problems, trace IGMP messages and look for anything which does not make sense, like never seeing a join, or seeing erratic leaves.

(Note that IGMP messages are sent by the OS on a machine level, and not by your application. This means that not every JoinMulticastGroup() will generate an IGMP join message.)

Johannes Overmann
  • 4,914
  • 22
  • 38