Long delays in sending UDP packets

Question

I have an application that receives, processes, and transmits UDP packets.

Everything works fine if the port numbers for reception and transmission are different.

If the port numbers are the same and the IP addresses are different it usually works fine EXCEPT when the IP address are on the same subnet as the machine running the application. In that last case the send_to function requires several seconds to complete, instead of a few milliseconds as is usual.

Rx Port  Tx IP          Tx Port    Result

5001     Same           5002       OK  Delay ~ 0.001 secs
         subnet     

5001     Different      5001       OK  Delay ~ 0.001 secs
         subnet

5001     Same           5001       Fails  Delay > 2 secs
         subnet

Here is a short program that demonstrates the problem.

#include <ctime>
#include <iostream>
#include <string>
#include <boost/array.hpp>
#include <boost/asio.hpp>

using boost::asio::ip::udp;
using std::cout;
using std::endl;

int test( const std::string& output_IP)
{
    try
    {
        unsigned short prev_seq_no;

        boost::asio::io_service io_service;

        // build the input socket

        /* This is connected to a UDP client that is running continuously
        sending messages that include an incrementing sequence number
        */

        const int input_port = 5001;
        udp::socket input_socket(io_service, udp::endpoint(udp::v4(), input_port ));

        // build the output socket

        const std::string output_Port = "5001";
        udp::resolver resolver(io_service);
        udp::resolver::query query(udp::v4(), output_IP, output_Port );
        udp::endpoint output_endpoint = *resolver.resolve(query);
        udp::socket output_socket( io_service );
        output_socket.open(udp::v4());

       // double output buffer size
       boost::asio::socket_base::send_buffer_size option( 8192 * 2 );
       output_socket.set_option(option);

        cout  << "TX to " << output_endpoint.address() << ":"  << output_endpoint.port() << endl;



        int count = 0;
        for (;;)
        {
            // receive packet
            unsigned short recv_buf[ 20000 ];
            udp::endpoint remote_endpoint;
            boost::system::error_code error;
            int bytes_received = input_socket.receive_from(boost::asio::buffer(recv_buf,20000),
                                 remote_endpoint, 0, error);

            if (error && error != boost::asio::error::message_size)
                throw boost::system::system_error(error);

            // start timer
            __int64 TimeStart;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeStart );

            // send onwards
            boost::system::error_code ignored_error;
            output_socket.send_to(
                boost::asio::buffer(recv_buf,bytes_received),
                output_endpoint, 0, ignored_error);

            // stop time and display tx time
            __int64 TimeEnd;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeEnd );
            __int64 f;
            QueryPerformanceFrequency( (LARGE_INTEGER *)&f );
            cout << "Send time secs " << (double) ( TimeEnd - TimeStart ) / (double) f << endl;

            // stop after loops
            if( count++ > 10 )
                break;
        }
    }
    catch (std::exception& e)
    {
        std::cerr << e.what() << std::endl;
    }

}
int main(  )
{

    test( "193.168.1.200" );

    test( "192.168.1.200" );

    return 0;
}

The output from this program, when running on a machine with address 192.168.1.101

TX to 193.168.1.200:5001
Send time secs 0.0232749
Send time secs 0.00541566
Send time secs 0.00924535
Send time secs 0.00449014
Send time secs 0.00616714
Send time secs 0.0199299
Send time secs 0.00746081
Send time secs 0.000157972
Send time secs 0.000246775
Send time secs 0.00775578
Send time secs 0.00477618
Send time secs 0.0187321
TX to 192.168.1.200:5001
Send time secs 1.39485
Send time secs 3.00026
Send time secs 3.00104
Send time secs 0.00025927
Send time secs 3.00163
Send time secs 2.99895
Send time secs 6.64908e-005
Send time secs 2.99864
Send time secs 2.98798
Send time secs 3.00001
Send time secs 3.00124
Send time secs 9.86207e-005

Why is this happening? Is there any way I can reduce the delay?

Notes:

Built using code::blocks, running under various flavours of Windows
Packet are 10000 bytes long
The problem goes away if I connect the computer running the application to a second network. For example a WWLAN ( cellular network "rocket stick" )

As far as I can tell, this is the situation we have:

This works ( different ports, same LAN ):

This also works ( same ports, different LANS ):

This does NOT work ( same ports, same LAN ):

This seems to work ( same ports, same LAN, dual homed Module2 host )

I really find it hard to believe. I would suggest removing all rerefences to boost and build the same functionality using raw BSD sockets. — SergeyA, Nov 23 '15 at 15:58
In no way send can take 3 seconds. And in no way there is a general issue with sending UDP packets over the local network where all listeners are bound to the same port. — SergeyA, Nov 23 '15 at 16:01
@the_non_believers This happens for three different people at different locations. — ravenspoint, Nov 23 '15 at 16:07
I can believe boost::asio::* is terribly broken. Never used it. But I have done *a lot* of network programming, and I know for a fact, there is no general problem with sending datagramms to the same port on a different node. This is why I am suggesting removing all reference to boost - the example can be easily rewritten in plain BSD. You can have a terribly broken network stack on your machine, but I'd rather believe in something else. — SergeyA, Nov 23 '15 at 16:10
@SergeyA I need this to work with boost::asio, it is a tiny part of a large application that depends on boost::asio — ravenspoint, Nov 23 '15 at 16:12
But you should first check out all posibilities, right? Check plain BSD in the simple example. If it misbehaves, than... I do not know what. Talk to network engineers. If it does not (as I expect!) you can look deeply into boost implementation. It might be broken. ACE is, for instance. — SergeyA, Nov 23 '15 at 16:15
It is not just my machine. This happens with two users of the real application, in very different locations — ravenspoint, Nov 23 '15 at 16:20
Right. This is why I say that network stack is last to blame. Do plain sockets first, than we will talk. ;) — SergeyA, Nov 23 '15 at 16:22
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/95953/discussion-between-ravenspoint-and-sergeya). — ravenspoint, Nov 23 '15 at 16:23
Can't go in the chat, blocked from my corp network (speaking of nets!) :) — SergeyA, Nov 23 '15 at 16:25
(copied form chat) do not really know what you mean by "plain sockets" I have used only boost::asio for networking programming (for the last ten years ). Can you post such code? — ravenspoint, Nov 23 '15 at 16:32
Maybe there is something about ARP going on. Is the target IP connected/reachable ? Can you find the IP in the ARP table (command "arp -a" for all OSes)? — ElderBug, Nov 23 '15 at 16:41
@ElderBug neither of the destination addresses exist in my test setup. The application users who report the problem, do have machines listening on the destination addresses. The behaviour is the same whether or not the addresses physically exist. They are not in the ARP table for my setup. — ravenspoint, Nov 23 '15 at 16:56
I'm not familiar with `boost:asio`, but the most likely explanation would be that you're not shutting down the `input_socket` from the first call to `test`, so on the second call, you have two sockets trying to receive the same data from the client. The times you are measuring are not (just) the time for sending, but also the time for the asynchronous receive to complete -- with fighting sockets, that probably causes something bad to happen. Try shutting down the input socket after the loop in `test` before returning. — Chris Dodd, Nov 23 '15 at 17:05
@ChrisDodd Thank you for your response. If I swap the order of the test calls, so that the 'bad' address is used first on a 'fresh' socket the results are the same. Note also that when the sockets go out of scope, their destructors will close them. — ravenspoint, Nov 23 '15 at 17:08
I think Chris is on something. The delay may be because the `asio::buffer` is not completed, and the send wait for the async receive to complete, because it needs the buffer (sorry if it seems obvious, I don't know boost::asio). The receive would then be completed 3s later. What happens every 3s, it seems you do something, since it is constant ? — ElderBug, Nov 23 '15 at 17:26
Posted my code and results - of course, it is lighting fast. Look into boost stuff. — SergeyA, Nov 23 '15 at 18:11
The majority of time I observe this type of behavior is because of the network configuration. Consider using [iperf](http://software.es.net/iperf/) to independently measure the network, then use a network analyzer tool, such as [wireshark](https://www.wireshark.org/) to get a deeper view into what is occurring on a given node. When debugging, sometimes changing to a TCP protocol or different port may illuminate network characteristics (throttling, shaping, etc). — Tanner Sansbury, Nov 23 '15 at 18:37
@TannerSansbury, there.is.no.network.configuration.here. `send` performance is 100% agnostic of network configuration. — SergeyA, Nov 23 '15 at 18:38
@SergeyA For what it is worth, I have experience on systems where `send()` has been affected by network configuration. — Tanner Sansbury, Nov 23 '15 at 19:08
@TannerSansbury, find it hard to believe. Care to name such a system? — SergeyA, Nov 23 '15 at 19:09
For reference, you _do_ have weird networking issues: https://stackoverflow.com/a/33848090/85371 This is not normal. I have the strong feeling there has been a similar post, possibly a coworker of you over the last week — sehe, Nov 23 '15 at 19:19
@sehe I posted that question. The root problem is the same. I found a workaround for the effects of the problem posted there ( reader starvation due to the slow transmission ), but my users are still very unhappy about the 2 plus second delay they see. — ravenspoint, Nov 23 '15 at 19:59
@SergeyA Some RTOS that do not have a dedicated kernel transmit buffer, `send()` may block waiting for ARP resolution. At one point, Integrity exhibited this behavior. I believe Windows exhibit similar behavior if the datagram exceeds the threshold that determines if the user buffer gets copied or memory mapped. — Tanner Sansbury, Nov 23 '15 at 20:07
@ravenspoint When sending datagrams to `192.168.1.200:5001`, do you observe the same performance for smaller datagrams? The `HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold` registry value (default `1024`) can affect UDP performance Datagrams exceeding this value are held until the datagram is actually sent. — Tanner Sansbury, Nov 23 '15 at 20:18
@TannerSansbury Would that effect not be the same for every address? — ravenspoint, Nov 23 '15 at 20:21
@TannerSansbury, amazing. I always knew I should stay away from Windows :) But at least, you Windows magician have a job security :) — SergeyA, Nov 23 '15 at 20:23
@SergeyA Tanner is an Asio magician. Don't accuse him of windows :) — sehe, Nov 23 '15 at 20:32
@sehe, HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold‌ looks windows enough to me! :) — SergeyA, Nov 23 '15 at 20:33
@TannerSansbury, I also believe, some apologies are in order. From my side, it is of course. — SergeyA, Nov 23 '15 at 20:36
No problem for me whatsoever (20k packets): http://paste.ubuntu.com/13479999/ (that's with [this code](http://paste.ubuntu.com/13480004/)). The elephant in the room is: **what are the addrresses**. Did you trace route both IPs? — sehe, Nov 23 '15 at 20:37
@sehe The addresses are in the code I posted. They are arbitrary. The effect is the same whether or not there is anything listening at this addresses. In my setup there is nothing at those addresses. — ravenspoint, Nov 23 '15 at 20:39
@sehe. There are no routes to the addresses. The delay is occurring in the completion of send_to, not in transmitting the packet through the network. — ravenspoint, Nov 23 '15 at 20:42
@ravenspoint No, they would not be affected in the same manner. When sending to the same subnet, ARP resolution will need to occur. I suspect that the datagram's larger than the `FastSendDatagramThreshold‌` are blocking `send()` waiting for ARP resolution to occur on each send (as results are not being cached). When sending to an address not on the subnet, the network stack will use the default gateway's ARP response to fill out the ethernet frame. — Tanner Sansbury, Nov 23 '15 at 20:46
@SergeyA No problem! It sounds as though people learned something, including myself, so everyone wins. (: — Tanner Sansbury, Nov 23 '15 at 20:49
@TannerSansbury You have provided the first hope I have seen on this. Can you explain a bit more how to change this default parameter? HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold‌ does not exist on my system — ravenspoint, Nov 23 '15 at 20:50
The closest I can find is HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\AFD — ravenspoint, Nov 23 '15 at 20:51
@TannerSansbury, looks like we have our answer? Post it, and I will be first to upvote. — SergeyA, Nov 23 '15 at 21:03
@TannerSansbury I added HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\AFD\Parameters with the value 2048 to the registry, rebooted and tried again. No change. — ravenspoint, Nov 23 '15 at 21:22
@ravenspoint I have not used Windows in many years, so I neither know exactly where `FastSendDatagramThreshold‌` resides in registry nor the effects of changing it beyond what is documented. Regardless, the increase in latency waiting for ARP resolution will occur either way. If you do not want user code to be affected by it, then consider using async operations. — Tanner Sansbury, Nov 24 '15 at 21:45

Tanner Sansbury · Answer 1 · 2015-12-02T21:03:29.537

6

Given this is being observed on Windows for large datagrams with a destination address of a non-existent peer within the same subnet as the sender, the problem is likely the result of send() blocking waiting for an Address Resolution Protocol (ARP) response so that the layer2 ethernet frame can populated:

When sending data, the layer2 ethernet frame will be populated with the media access control (MAC) Address of the next hop in the route. If the sender does not know the MAC Address for the next hop, it will broadcast an ARP request and cache responses. Using the sender's subnet mask and the destination address, the sender can determine if the next hop is on the same subnet as the sender or if the data must route through the default gateway. Based on the results in the question, when sending large datagrams:
- datagrams destined to a different subnet have no delay because the default gateway's MAC Address is within the sender's ARP cache
- datagrams destined to a non-existent peer on the sender's subnet incur a delay waiting for ARP resolution
The socket's send buffer size (SO_SNDBUF) is being set to 16384 bytes, but the size of datagrams being sent are 10000. It is unspecified as to the behavior behavior of send() when the buffer is saturated, but some systems will observe send() blocking. In this case, saturation would occur fairly quickly if any datagrams incur a delay, such as by waiting for an ARP response.
```
// Datagrams being sent are 10000 bytes, but the socket buffer is 16384.
boost::asio::socket_base::send_buffer_size option(8192 * 2);
output_socket.set_option(option);
```
Consider letting the kernel manage the socket buffer size or increasing it based on your expected throughput.
When sending a datagram with a size that exceeds the Window's registry FastSendDatagramThreshold‌ parameter, the send() call can block until the datagram has been sent. For more details, see the Microsoft TCP/IP Implementation Details:

Datagrams smaller than the value of this parameter go through the fast I/O path or are buffered on send. Larger ones are held until the datagram is actually sent. The default value was found by testing to be the best overall value for performance. Fast I/O means copying data and bypassing the I/O subsystem, instead of mapping memory and going through the I/O subsystem. This is advantageous for small amounts of data. Changing this value is not generally recommended.

If one is observing delays for each send() to an existing peer on the sender's subnet, then profile and analyze the network:

Use iperf to measure the network potential throughput
Use wireshark to get a deeper view into what is occurring on a given node. Look for ARP request and responses.
From the sender's machine, ping the peer and then check the APR cache. Verify that there is a cache entry for the peer and that it is correct.
Try a different port and/or TCP. This can help identify if a networks policies are throttling or shaping traffic for a particular port or protocol.

Also note that sending datagrams below the FastSendDatagramThreshold value in quick succession while waiting for ARP to resolve may cause datagrams to be discarded:

ARP queues only one outbound IP datagram for a specified destination address while that IP address is being resolved to a media access control address. If a User Datagram Protocol (UDP)-based application sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present. An application can compensate for this by calling the iphlpapi.dll routine SendArp() to establish an ARP cache entry, before sending the stream of packets.

edited Dec 02 '15 at 21:03

answered Nov 24 '15 at 21:36

Tanner Sansbury

51,153
9
112
169

I added HKLM\System\CurrentControlSet\Services\Afd\Parameters\FastSendDatagramThreshold‌‌ to my registry with a value of 2048 ( decimal ), rebootted and tried agin. No change. – ravenspoint Nov 25 '15 at 01:34
@ravenspoint As noted above, changing `FastSendDatagramThreshold‌‌` is generally not recommended. I would strongly recommend solving the problem, not the symptom. If you are willing to accept all the risk involved in changing `FastSendDatagramThreshold‌`, then consult the TCP/IP Implementation Details for the version of Windows you are using to determine the location of the AFD Registry Parameters. – Tanner Sansbury Nov 25 '15 at 04:39
"I would strongly recommend solving the problem" Yes, that is what I have been trying to do for the last five days.. I understood that your recommendation was to change this value. What is it that you are recommending? – ravenspoint Nov 25 '15 at 13:00
"ARP is used for mapping a network address (e.g. an IPv4 address) to a physical address like an Ethernet address (also named a MAC address)." It seems to me that ARP has nothing to do with port numbers, all it should care about is IP addresses. In my case ipa:5001, ipb:5001 fails but ipa:5001,ipb:5002 succeeds. Why would ARP respond differently when I change the port number? – ravenspoint Nov 25 '15 at 13:24
@ravenspoint My recommendation has been to use network analysis tools to investigate further and identify the problem. Changing higher-level variables without definitively identifying the fundamental problem is only introducing noise. – Tanner Sansbury Nov 25 '15 at 13:51
The problem occurs whether or not there are hosts connected to the network addresses. There is nothing for network analysis tools to see. – ravenspoint Nov 25 '15 at 14:21
@ravenspoint wireshark is not capturing any traffic related (not necessarily destined) to the network addresses (regardless of the peer's existence)? – Tanner Sansbury Nov 25 '15 at 15:21
FYI. The problem vanishes if I connect a second network ( cellular network 'rocket stick' ) – ravenspoint Nov 25 '15 at 16:27
@ravenspoint Incase it was not clear, when attempting to send a UDP message to a network address on which no hosts is connected, there should be ARP request trying to discover who is at the destined network address. For networking problems like this, I have had far more success in diagnosing issues like this by systematically probing deeper (examining the network stack), rather than changing higher-level variables (multihoming). – Tanner Sansbury Nov 25 '15 at 23:20
Why do you think ARP has anything to do with it? The problem goes away if I change one of the port numbers. Surely ARP has nothing to do with port numbers? – ravenspoint Nov 25 '15 at 23:23
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/96193/discussion-between-tanner-sansbury-and-ravenspoint). – Tanner Sansbury Nov 25 '15 at 23:35
@ravenspoint, curious - did you discover anything? It is rather interesting problem! – SergeyA Dec 02 '15 at 14:26
@sergeyA Nothing new. My users have agreed to change their port numbers. – ravenspoint Dec 02 '15 at 14:28
@ravenspoint For your test case, I believe the small `SO_SNDBUF` value may be a contributing factor to the observed delays. Given the size of your data, consider letting the kernel manage the socket send buffer size or increasing it, even if changing port numbers has reduced the time blocked in `send()`. With Asio and BSD sockets on Linux, I can reproduce a 3s~ delay by saturating the socket send buffer with UDP messages being sent to addresses on the same subnet but no machine at the destination. Analyzing the network stack shows that the messages are timing out waiting for ARP. – Tanner Sansbury Dec 02 '15 at 21:12
@TannerSansbury Small SO_SNDBUF value? I have doubled it, so that it is bigger than the packet size. The packets only arrive every 50 ms, so the socket should never become saturated. Do you think I should increase the send buffer even more? – ravenspoint Dec 02 '15 at 21:52
@ravenspoint I do not have enough context of the application protocol and network characteristics to definitely state if saturation will or will not occur. However, it is a detail that could quickly compound the issue if the arrival rate exceeds the send rate. At a 50ms arrival interval, the example program on Linux and OSX is observing saturation of the socket send buffer with the scenario described above. – Tanner Sansbury Dec 02 '15 at 23:01
@TannerSansbury If packets arrive every 50ms and require 1 ms to be transmitted, there is no way anything can become saturated. – ravenspoint Dec 02 '15 at 23:34

score 2 · Answer 2 · answered Nov 23 '15 at 17:45

Alright, put together some code (below). It is clear that send takes less than one milisecond most of the time. This proves the problem is with the boost.

#include <iostream>
#include <string>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdexcept>
#include <poll.h>
#include <string>
#include <memory.h>
#include <chrono>
#include <stdio.h>

void test( const std::string& remote, const std::string& hello_string, bool first)
{
    try
    {
        const short unsigned input_port = htons(5001);
        int sock = socket(AF_INET, SOCK_DGRAM, 0);
        if (sock == -1) {
            perror("Socket creation error: ");
            throw std::runtime_error("Could not create socket!");
        }

        sockaddr_in local_addr;
        local_addr.sin_port = input_port;
        local_addr.sin_addr.s_addr = INADDR_ANY;
        if (bind(sock, (const sockaddr*)&local_addr, sizeof(local_addr))) {
            perror("Error: ");
            throw std::runtime_error("Can't bind to port!");
        }

        sockaddr_in remote_addr;
        remote_addr.sin_port = input_port;
        if (!inet_aton(remote.c_str(), &remote_addr.sin_addr))
            throw std::runtime_error("Can't parse remote IP address!");

        std::cout  << "TX to " << remote << "\n";

        unsigned char recv_buf[40000];

        if (first) {
            std::cout << "First launched, waiting for hello.\n";
            int bytes = recv(sock, &recv_buf, sizeof(recv_buf), 0);
            std::cout << "Seen hello from my friend here: " << recv_buf << ".\n";
        }

        int count = 0;
        for (;;)
        {

            std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
            if (sendto(sock, hello_string.c_str(), hello_string.size() + 1, 0, (const sockaddr*)&remote_addr, sizeof(remote_addr)) != hello_string.size() + 1) {
                perror("Sendto error: ");
                throw std::runtime_error("Error sending data");
            }
            std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now();

            std::cout << "Send time nanosecs " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << "\n";

            int bytes = recv(sock, &recv_buf, sizeof(recv_buf), 0);
            std::cout << "Seen hello from my friend here: " << recv_buf << ".\n";

            // stop after loops
            if (count++ > 10)
                break;
        }
    }
    catch (std::exception& e)
    {
        std::cerr << e.what() << std::endl;
    }

}
int main(int argc, char* argv[])
{
    test(argv[1], argv[2], *argv[3] == 'f');

    return 0;
}

As expected, there is no delay. Here is output from one of the pairs (I run the code in pairs on two machines in the same network):

./socktest x.x.x.x 'ThingTwo' f
TX to x.x.x.x
First launched, waiting for hello.
Seen hello from my friend here: ThingOne.
Send time nanosecs 17726
Seen hello from my friend here: ThingOne.
Send time nanosecs 6479
Seen hello from my friend here: ThingOne.
Send time nanosecs 6362
Seen hello from my friend here: ThingOne.
Send time nanosecs 6048
Seen hello from my friend here: ThingOne.
Send time nanosecs 6246
Seen hello from my friend here: ThingOne.
Send time nanosecs 5691
Seen hello from my friend here: ThingOne.
Send time nanosecs 5665
Seen hello from my friend here: ThingOne.
Send time nanosecs 5930
Seen hello from my friend here: ThingOne.
Send time nanosecs 6082
Seen hello from my friend here: ThingOne.
Send time nanosecs 5493
Seen hello from my friend here: ThingOne.
Send time nanosecs 5893
Seen hello from my friend here: ThingOne.
Send time nanosecs 5597

Unfortunately this does not compile under code::blocks on windows. I tried replacing sockets.h with winsock.h but now I am all tied up in POSIX problems — ravenspoint, Nov 23 '15 at 19:55
@ravenspoint, this should work on Windows, provided you inititalize winsock. AFAIR, all those functions are defined for Windows as well? If not, just replace them with WSA counterparts. — SergeyA, Nov 23 '15 at 20:01
You woud not need any system header than, just remove all of them. — SergeyA, Nov 23 '15 at 20:01
I tried removing them. Get 'inet_aton' was not declared in this scope| — ravenspoint, Nov 23 '15 at 20:02
too many arguments to function 'long unsigned int inet_addr(const char*)'| — ravenspoint, Nov 23 '15 at 20:05
Yeah - it returns int. Something like this should do: `remote_addr.sin_addr.s_addr = inet_addr(remote.c_str())` — SergeyA, Nov 23 '15 at 20:08
The string you are sending, it looks to be very short. Short messages do not cause this problem. Typical packets that cause the problem are 1000 bytes long — ravenspoint, Nov 23 '15 at 20:09
Ok, you didn't tell me that. You want a string of 1000? I can check this. — SergeyA, Nov 23 '15 at 20:09
You will need to double the send buffer size to properly handle the large packets ( default in windows is 8192 ) — ravenspoint, Nov 23 '15 at 20:16
No issues on Linux :) Just sent 10K packets. Of course, took longer - but still nowhere in your territory, lower ten thousands of nanoseconds. — SergeyA, Nov 23 '15 at 20:20
No problem for me whatsoever (20k packets): http://paste.ubuntu.com/13479999/ (that's with [this code](http://paste.ubuntu.com/13480004/)). The elephant in the room is: **what are the addrresses**. Did you trace route both IPs? — sehe, Nov 23 '15 at 20:36
@sehe I am suspicious of the small socket send buffer (`SO_SNDBUF`) being set in the original code It is set to 1.6x the size of the datagrams being sent. With Asio and BSD sockets on Linux, I can reproduce a 3s~ delay by saturating the socket send buffer with UDP messages being sent to addresses on the same subnet but no machine at the destination. Analyzing the network stack shows that the messages are timing out waiting for ARP. — Tanner Sansbury, Dec 02 '15 at 21:16

score 1 · Answer 3 · answered Nov 24 '15 at 06:07

1

It is good practice to segregate Tx and Rx ports. I derive my own socket class from CAsynchSocket as it has a message pump that sends a system message when data is received on your socket and yanks the OnReceive function (either yours if u overide the underlying virtual function or the default if you dont

answered Nov 24 '15 at 06:07

CobraZulu

21
3

Thank you for your response. CAsynchSocket is, I believe, part of MFC. This question is about boost::asio. Remember to check the question tags. – ravenspoint Nov 24 '15 at 14:19

Long delays in sending UDP packets

3 Answers3