14

I have quite a bewildering problem.
I'm using a big C++ library for handling some proprietary protocol over UDP on Windows XP/7. It listens on one port throughout the run of the program, and waits for connections from distant peers.

Most of the time, this works well. However, due to some problems I'd experienced, I've decided to add a simple debug print directly after the call to WSARecvFrom (the win32 function used in the library to recv datagrams from my socket of interest, and tell what IP and port they came from).
Strangely enough, in some cases, I've discovered packets are dropped at the OS level (i.e. I see them in Wireshark, they have the right dst-port, all checksums are correct - but they never appear in the debug prints I've implanted into the code).

Now, I'm fully of the fact (people tend to mention a bit too often) that "UDP doesn't guarantee delivery" - but this is not relevant, as the packets are received by the machine - I see them in Wireshark.
Also, I'm familiar with OS buffers and the potential to fill up, but here comes the weird part...

I've done some research trying to find out which packets are dropped exactly. What I've discovered, is that all dropped packets share two things in common (though some, but definitely not most, of the packets that aren't dropped share these as well):

  1. They are small. Many of the packets in the protocol are large, close to MTU - but all packets that are dropped are under 100 bytes (gross).
  2. They are always one of two: a SYN-equivalent (i.e. the first packet a peer sends to us in order to initiate communications) or a FIN-equivalent (i.e. a packet a peer sends when it is no longer interested in talking to us).

Can either one of these two qualities affect the OS buffers, and cause packets to be randomly (or even more interesting - selectively) dropped?
Any light shed on this strange issue would be very appreciated.

Many thanks.


EDIT (24/10/12):

I think I may have missed an important detail. It seems that the packets dropped before arrival share something else in common: They (and I'm starting to believe, only they) are sent to the server by "new" peers, i.e. peers that it hasn't tried to contact before.

For example, if a syn-equivalent packet arrives from a peer* we've never seen before, it will not be seen by WSARecvFrom. However, if we have sent a syn-equivalent packet to that peer ourselves (even if it didn't reply at the time), and now it sends us a syn-equivalent, we will see it.

(*) I'm not sure whether this is a peer we haven't seen (i.e. ip:port) or just a port we haven't seen before.

Does this help?
Is this some kind of WinSock option I've never heard of? (as I stated above, the code is not mine, so it may be using socket options I'm not aware of)

Thanks again!

Oded R.
  • 475
  • 1
  • 4
  • 10
  • 1
    I assume it's not being dropped due to the `RecvFrom()` `from` parameter or the socket binding? – Deanna Oct 11 '12 at 09:50
  • @Deanna: I strongly believe that's true - AFAIK `from` problem should raise some error code with WSAGetLastError (the only one I ever get is WSAEWOULDBLOCK, which AFAIK means there was nothing in the buffer at the moment); Socket binding is supposedly ok - packets before and after the dropped one (to the same port) are received and processed. – Oded R. Oct 11 '12 at 10:10
  • The the same IP too? Are you getting anything from the problem IP(s)? – Deanna Oct 11 '12 at 10:19
  • @Deanna: Yes, sometimes it's the same IP. For example, in one occasion I see 5 practically-identical syn-equivalent packets from the same IP over a period of ~5sec, all not reaching `WSARecvFrom`, but half a second later, my code initiates a connection with that IP (yes, it can initiate as well) successfully, and its reply to my seq-equivalent is received and processed successfully. – Oded R. Oct 11 '12 at 10:37
  • Possibly duplicate question: [udp packet caught by tcpdump, but not received by socket](http://stackoverflow.com/q/12838222/588306) – Deanna Oct 11 '12 at 13:01
  • Did you solved the problem? I am facing similar issue. – Girish Sep 06 '20 at 17:55
  • @Girish I don't really remember (it was 8 years ago and I changed fields nearly 6 years ago), but I do remember finding the answers below helpful... hope one of them works for you too! – Oded R. Oct 12 '20 at 00:42

4 Answers4

3

The OS has a fixed size buffer for data that has arrived at your socket but hasn't yet been read by you. When this buffer is exhausted, it'll start to discard data. Debug logging may exacerbate this by delaying the rate you pull data from the socket at, increasing the chances of overflows.

If this is the problem, you could at least reduce the instances of it by requesting a larger recv buffer.

You can check the size of your socket's recv buffer using

int recvBufSize;
int err = getsockopt(socket, SOL_SOCKET, SO_RCVBUF,
                     (char*)&recvBufSize, sizeof(recvBufSize));

and you can set it to a larger size using

int recvBufSize = /* usage specific size */;
int err = setsockopt(socket, SOL_SOCKET, SO_RCVBUF,
                     (const char*)&recvBufSize, sizeof(recvBufSize));

If you still see data being received by the OS but not delivered to your socket client, you could think about different approaches to logging. e.g.

  • Log to a RAM buffer and only print it occasionally (at whatever size you profile to be most efficient)
  • Log from a low priority thread, either accepting that the memory requirements for this will be unpredictable or adding code to discard data from the log's buffer when it gets full
simonc
  • 41,632
  • 12
  • 85
  • 103
  • Thank you for your answer, however it still doesn't shed light on the specific characteristics of the packets being dropped (short, syn-like or fin-like, packet only). Also, I probably wasn't clear enough, but the logging didn't *cause* the problem, I added it to *investigate* the problem. I don't need it otherwise :) – Oded R. Oct 11 '12 at 08:52
  • 2
    I don't think there is any defined behaviour for what data gets dropped when a socket's recv buffer fills. Have you tried increasing the buffer size to see if that reduces the number of dropped packets? – simonc Oct 11 '12 at 09:01
  • I will try both querying and adjusting the buffer size, but I agree - `"I don't think there is any defined behaviour for what data gets dropped"` - and since I see a very distinct pattern, I wonder whether something else than the buffer may be causing this issue? – Oded R. Oct 11 '12 at 10:14
  • 1
    Only data packets are delivered to an application. This is by design of tcp/ip and sockets. "Packet of size 100 or less" makes me think you only have headers with no data. Can you confirm if those packets are destined to same ip and port and contain data too? Even better if you can post a small pcap containing one packet which was received and another which was not – fkl Oct 11 '12 at 12:15
  • @fayyazkl - Thanks for you contribution. All packets I've mentioned contain UDP data of at-least 30 bytes. [This pcap](https://dl.dropbox.com/u/80287490/for_stackoverflow_12834501.cap) contains two packets that hasn't been received by `WSARecvFrom`, and two that have. The IPs are real, but the UDP data (and therefore checksum) I've had to override. – Oded R. Oct 14 '12 at 08:58
  • Oh, and re. the destination IP & port - yes, as I've mentioned in the comments to the original question ( discussion with @Deanna ), all packets (successful and failing) arrive to the computer, have the same destination IP, the same destination port, and good checksums. – Oded R. Oct 14 '12 at 09:03
2

I had a very similar issue, after confirming that the receive buffer wasn't causing drops, I learned that it was because I had the receive timeout set too low at 1ms. Setting the socket to non-blocking and not setting the receive timeout fixed the issue for me.

zeeman_effect
  • 108
  • 2
  • 6
1

Turn off the Windows Firewall.

Does that fix it? If so, you can likely enable the Firewall back on and just add a rule for your program.

That's my most logical guess based on what you said here in your update:

It seems that the packets dropped before arrival share something else in common: They (and I'm starting to believe, only they) are sent to the server by "new" peers, i.e. peers that it hasn't tried to contact before.

selbie
  • 100,020
  • 15
  • 103
  • 173
  • If the Windows Firewall was the problem, *none* of the data would get through, even to Wireshark. – user207421 Feb 02 '15 at 09:29
  • 1
    @EJP that's not true, at least for TCP on Vista and Seven: wireshark does show me incoming SYNs blocked by firewall. (And when I change firewall to allow, an identical connection attempt is shown *and* delivered.) I can't as easily test UDP, but I'd be astonished if it's different. Remember wireshark/winpcap pokes into the networking stack somewhere, it *doesn't* look at process-level activity like say ProcMon. That said, I don't think Windows Firewall is the answer for the problem in this question. – dave_thompson_085 Jul 10 '15 at 09:38
  • 1
    @dave_thompson_085 wire shark with same experience. my firewall allowed one side and not the other. as soon as i allowed both sides the app got all the packets. – user3097514 Aug 09 '17 at 23:58
0

Faced same kind of problem on the redhat-linux as well. This turn out to be a routing issue.

RCA is as follows:

  1. True that UDP is able to reach destination machine(seen on Wireshark).
  2. Now route to source is not found so there is no reply can be seen on the Wireshark.
  3. On some OS you can see the request packet on the Wireshark but OS does not actually delivers the packets socket (You can see this socket in the netstat-nap).
  4. In these case please check ping always (ping <dest ip> -I<source ip>)
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129