1

I've been experiencing an issue with my server software where if one thread joins a multicast, another thread may not receive an incoming datagram on a different multicast at that same instant. I'm not sure if this can be dismissed as an expected loss due to "unreliable nature" of UDP multicast, or if this is a serious driver/nic defect. Packet capture also shows a gap at that moment.

I've observed this problem on multiple nic models and manufacturers, including Intel and HP. The reason I feel this is a nic or driver issue is that the problem doesn't occur at all if I run a packet sniffer that puts the interface into promiscuous mode.

Is it possible that while IGMP join or leave is updating forwarding tables in the nic it simply stops forwarding all multicast traffic at that moment? Is this acceptable?

Marcin
  • 12,245
  • 9
  • 42
  • 49
  • Perhaps this would be better to ask at serverfault? – Marcin May 29 '12 at 18:05
  • How do you get that "packet capture" without putting the NIC into promiscuous mode? – Nikolai Fetissov May 29 '12 at 18:39
  • @Nikolai You can capture without promiscuous mode. For example, using tshark with -p flag. http://www.wireshark.org/docs/man-pages/tshark.html – Marcin May 29 '12 at 18:43
  • And you are saying with all things being equal you don't lose packets with NIC in promiscuous mode? – Nikolai Fetissov May 29 '12 at 18:52
  • Promiscuous mode on = packets appear in capture and are received by server. Promiscuous mode off = packets missing in capture around IGMP join and leave requests, and sequence gaps reported by server. – Marcin May 29 '12 at 19:54
  • Well, the driver does need to re-program NIC hardware with new MAC address. How many groups are you joining in total? I remember that e100 had only 64 hw multicast filters, falling back to listening for all multicast packets and then filtering in software. – Nikolai Fetissov May 29 '12 at 20:34
  • I was able to reproduce this with just 2 multicast groups. Of course, the more groups involved, the more often the issue occurs. – Marcin May 29 '12 at 20:58
  • This does feel like a race in the driver somewhere, though that'd be hard to prove. I guess the workaround would be to try and join all the necessary groups beforehand. – Nikolai Fetissov May 29 '12 at 21:08
  • Unfortunately this is not possible, since it's a huge amount of bandwidth. The joins are supposed to be done on an as-needed basis. I'll continue investigating this myself and post an answer if I find anything. – Marcin May 29 '12 at 21:25

0 Answers0