2

The MVCE below simplified from real codebase shows the same issue.

The server continuously sends a "burst" of 5 UDP frames filled with 150 bytes of value 0xA5 with small or no delay in between. A pause of 1 second is made.

The client use the boost::asio async_receive_from() function in parallel of a 1 second timer. The client works relatively well except when the delay between the UDP frames is "too" small. It seems that the correct size ( here 150 bytes ) is retrieved but the buffer/vector seems not to be updated.

  • 5 x 150 bytes of UDP frames does not seem much.
  • Wireshark DOES see the complete and correct frames sent.
  • If I use a synchronous boost asio socket synchronous receive_from() I meet no issues

I tried maybe half a dozen times to dive into boost asio without much success in finding a single truth or rationale. Same posts on SO show very different code so that it is difficult to transpose them to the present code

Here are the code client (client_with_timer.cc)

#include <iostream>
#include <vector>
#include <string>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>

using namespace boost::asio;
void asyncReadHandler( const boost::system::error_code& error, std::size_t bytesTransferred );
void timeoutHandler( const boost::system::error_code& error, bool* ptime_out );

size_t ReceivedDataSize;
std::string ReadError;

int main(int argc, char * argv[])
{
    io_service io;

    ip::udp::socket socket(io, ip::udp::endpoint(ip::udp::v4(), 1620));

    size_t num = 0;

    while (true)
    {
        std::vector<unsigned char> vec(1500);

        ip::udp::endpoint from;

        socket.async_receive_from(
                        boost::asio::buffer( vec ),
                        from,
                        boost::bind(
                                asyncReadHandler,
                                boost::asio::placeholders::error,
                                boost::asio::placeholders::bytes_transferred ) );

        bool timeout = false;
        ReceivedDataSize = 0;
        ReadError = "";

        // Creating and starting timer (by registering timeout handler)
        deadline_timer timer( io, boost::posix_time::seconds( 1 ) );
        timer.async_wait(
            boost::bind( timeoutHandler, boost::asio::placeholders::error, &timeout ) );

        // Resetting IO service instance
        io.reset();

        while(io.run_one())
        {
            if ( timeout ) {
                socket.cancel();
                timer.cancel();
                //Leave the io run_one loop
                break;
            }
            else if ( (0 != ReceivedDataSize ) || (!ReadError.empty())) {
                timer.cancel();
                socket.cancel();
                std::cout << "Received n°" <<  num++ << ": " << ReceivedDataSize << "\r" << std::flush;

                if (0 != ReceivedDataSize )
                    vec.resize(ReceivedDataSize);

                if (!ReadError.empty())
                    std::cout << "Error: " << ReadError << std::endl;

                bool result = true;
                for ( auto x : vec )
                    if ( 0xA5 != x ) { result = false; break; }

                if ( false == result ) {
                    std::cout << std::endl << "Bad reception" << std::endl << std::hex;
                    for ( auto x : vec )
                        std::cout << (int)x << " ";

                    std::cout << std::dec << "\n";
                }
                //Leave the io run_one loop
                break;
            }
            else {
                //What shall I do here ???
                //another potential io.reset () did not bring much
            }

        }
    }

    return 0;
}

void asyncReadHandler( const boost::system::error_code& error, std::size_t bytesTransferred )
{
    // If read canceled, simply returning...
    if( error == boost::asio::error::operation_aborted ) return;

    ReceivedDataSize = 0;

    // If no error
    if( !error ) {
        ReceivedDataSize = bytesTransferred;
    }
    else {
        ReadError = error.message();
    }
}

void timeoutHandler( const boost::system::error_code& error, bool* ptime_out )
{
    // If timer canceled, simply returning...
    if( error == boost::asio::error::operation_aborted ) return;

    // Setting timeout flag
    *ptime_out = true;
}

Here is the server (server.cc) so that you do not have to roll your own

#include <iostream>
#include <vector>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <unistd.h>

using namespace boost::asio;

int main(int argc, char * argv[])
{
    io_service io;

    ip::udp::socket socket(io, ip::udp::endpoint(ip::udp::v4(), 0));

    std::vector<char> vec(150,0xA5);
#if 1
    int separator = 1 * 1000;
#else
    int separator = 0;
#endif

    while (true)
    {
        socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
        if ( separator ) usleep(separator);
        socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
        if ( separator ) usleep(separator);
        socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
        if ( separator ) usleep(separator);
        socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
        if ( separator ) usleep(separator);
        socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));

        usleep(1000*1000);
    }

    return 0;
}

I compiled both with the naive commands below:

g++ client_with_timer.cc -std=c++11 -O2 -Wall -o client_with_timer -lboost_system

g++ server.cc -std=c++11 -O2 -Wall -o server -lboost_system

It produces output like below when delay is too small

nils@localhost ASIO_C]$ ./client_with_timer 
Received n°21: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Received n°148: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Received n°166: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Received n°194: 150

How to correct the client code to avoid missed frames ? Any hint for a better understanding of boost asio rationale is welcome

NGI
  • 852
  • 1
  • 12
  • 31

1 Answers1

1

I think in your code are data races. If timer is expired (timeout occures) before read operation is completed below code is executed:

if ( timeout ) {
                socket.cancel();
                timer.cancel();
                //Leave the io run_one loop
                break; // [1]
            }

you are breaking from while loop, socket.cancel() cancels asynchronus read operation, its handler with operation_aborted error is queued and waits for processing in event loop. Because you jumped from while loop, run_one is not invoked, and this handler is still in queue.

io_service.reset() doesn't clear queue. Handler for aborted operation is still there. And waits to be invoked. reset() only sets stopped flag of io_service to false, then handlers can be processed by calls run_one, one .. methods, you are using reset to restore processing handlers from queue.

So we have unprocessed handler in queue, in main while loop new vector vec is created, all its elements are initialized to 0. async_receive_from started (it is reading into vec and set ReceivedDataSize in its handler), then reset is called, run_one can process handler and invokes handler for aborted operation! and you are testing ReceivedDataSize and vec for aborted operation... but you should do it for the last started async operation.

I would rewrite clause with timeout to:

if ( timeout ) {
                socket.cancel();
                timer.cancel();
} // no break

after removing break we guarantee that aborted operation is processed by run_one and there is no outstanding handler to be invoked when new async operation is started. After this modifcation I have not seen bad reception while testing your code.

EDIT

Regarding your comment, yes the other break statement should also be removed from the code.

Output of the program is unpredictable because you are starting async operations which takes reference to local variable (vec is modified by async_receive_from), handler is queued, local variable is destroyed and later the handler is called from io_service while vec has been already destroyed.

You can test the code below, and see what happens:

  boost::asio::io_context io; // alias on io_service

  boost::asio::system_timer t1{io};
  t1.expires_from_now(std::chrono::seconds(1));

  boost::asio::system_timer t2{io};
  t2.expires_from_now(std::chrono::seconds(1));

  boost::asio::system_timer t3{io};
  t3.expires_from_now(std::chrono::seconds(1));

  t1.async_wait ([](const boost::system::error_code& ec){ cout << "[1]" << endl;});
  t2.async_wait ([](const boost::system::error_code& ec){ cout << "[2]" << endl;});
  t3.async_wait ([](const boost::system::error_code& ec){ cout << "[3]" << endl;});
  // 3 handlers are queueud
  cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
  io.reset(); // RESET is called
  cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
  io.reset(); // RESET is called
  cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
  cout << "executed: " <<                io.poll_one() << endl; // call handler if any ready, print 0

we are calling io_service::reset but all handlers are executed. After removing breaks from your code, you ensure that all handlers will be performed, and it guarantees that the local data are valid when these handlers are called.

rafix07
  • 20,001
  • 3
  • 20
  • 33
  • I confirm that removing the break inside the timeout test makes the bug disappear from the MVCE on RHEL7 and CentOS 7 ( GCC 4.8.5 for both ) + disappear also from the real (complete) code. Many thanks for this and also for your explanation that seems clearer than all (the few) documentation I read. I still don't get all the story but with this new knowledge I will go again to further documentation reading. For the time being I just wonder about the validity of the second "break" of the reception test if the "break" of the timeout test was invalid !? – NGI Feb 05 '19 at 23:55