4

I am using boost::asio and boost::thread to realize a message service which accepts messages, send them asynchronously if there is no message being processed or queues the message if there are messages being processed.

The message rate is to my mind high, about 2.000 messages per second. With so many messages I face corrupted message, very seldom though. In 2.000 messages about 4-8 are corrupted. I believe the problem is due to incorrect use of the boost::asio and/or boost::thread library.

The code I've implemented is mainly based on this boost tutorial. I cannot find a mistake and since the major messages work out find it's hard for me to narrow down the problem.

Maybe someone else has an idea what's going wrong here?

Basically this class is used in the following way:

(1) The constructor is called in the beginning of my program in order to launch the thread, thus the service to accept and transmit messages

(2) Whenever I want to transmit a message I make a call to MessageService::transmitMessage() which delegates the task with async_write to the thread processing the message queue.

using namespace google::protobuf::io;
using boost::asio::ip::tcp;

MessageService::MessageService(std::string ip, std::string port) :
    work(io_service), resolver(io_service), socket(io_service) {

    messageQueue = new std::deque<AgentMessage>;
    tcp::resolver::query query(ip, port);
    endpoint_iterator = resolver.resolve(query);

    tcp::endpoint endpoint = *endpoint_iterator;

    socket.async_connect(endpoint, boost::bind(&MessageService::handle_connect,
            this, boost::asio::placeholders::error, ++endpoint_iterator));

    boost::thread t(boost::bind(&boost::asio::io_service::run, &io_service));
}

void MessageService::await() {

    while (!messageQueue->empty()) {

        signal(SIGINT, exit);

        int messagesLeft = messageQueue->size();
        sleep(3);
        std::cout << "Pending Profiler Agents Messages: "
                << messageQueue->size() << std::endl;
        if (messagesLeft == messageQueue->size()) {
            std::cout << "Connection Error" << std::endl;
            break;
        }
    }

    std::cout << i << std::endl;
}

void MessageService::write(AgentMessage agentMessage, long systemTime,
        int JVM_ID) {
    agentMessage.set_timestamp(Agent::Helper::getCurrentClockCycle());
    agentMessage.set_jvm_id(JVM_ID);
    agentMessage.set_systemtime(systemTime);
    io_service.post(boost::bind(&MessageService::do_write, this, agentMessage));
}

void MessageService::do_close() {
    socket.close();
}

void MessageService::transmitMessage(AgentMessage agentMessage) {

    ++i;

    boost::asio::streambuf b;
    std::ostream os(&b);

    ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
    CodedOutputStream *coded_output = new CodedOutputStream(raw_output);

    coded_output->WriteVarint32(agentMessage.ByteSize());
    agentMessage.SerializeToCodedStream(coded_output);

    delete coded_output;
    delete raw_output;

    boost::system::error_code ignored_error;

    boost::asio::async_write(socket, b.data(), boost::bind(
            &MessageService::handle_write, this,
            boost::asio::placeholders::error));
}

void MessageService::do_write(AgentMessage agentMessage) {

    bool write_in_progress = !messageQueue->empty();
    messageQueue->push_back(agentMessage);

    if (!write_in_progress) {
        transmitMessage(agentMessage);
    }
}

void MessageService::handle_write(const boost::system::error_code &error) {

    if (!error) {
        messageQueue->pop_front();
        if (!messageQueue->empty()) {
            transmitMessage(messageQueue->front());
        }
    } else {
        std::cout << error << std::endl;
        do_close();
    }
}

void MessageService::handle_connect(const boost::system::error_code &error,
        tcp::resolver::iterator endpoint_iterator) {
    // can be used to receive commands from the Java profiler interface
}

MessageService::~MessageService() {
    // TODO Auto-generated destructor stub
}

The header file:

    using boost::asio::ip::tcp;

class MessageService {
public:
    MessageService(std::string ip, std::string port);
    virtual ~MessageService();
    void write(AgentMessage agentMessage, long systemTime, int JVM_ID);
    void await();

private:
    boost::asio::io_service io_service;
    boost::asio::io_service::work work;
    tcp::resolver resolver;
    tcp::resolver::iterator endpoint_iterator;
    tcp::socket socket;
    std::deque<AgentMessage> *messageQueue;

    void do_write(AgentMessage agentMessage);

    void do_close();

    void handle_write(const boost::system::error_code &error);

    void handle_connect(const boost::system::error_code &error,
            tcp::resolver::iterator endpoint_iterator);

    void transmitMessage(AgentMessage agentMessage);
};
Konrad Reiche
  • 27,743
  • 15
  • 106
  • 143
  • Since you mention `boost::thread` I can only assume that you are using more than one thread in the program, but you have no locking to ensure that access to the message queue (I am assuming that is the only shared resource) is safe --I must admit that I have not gone through the code, so maybe access to the queue is single threaded... you tell me – David Rodríguez - dribeas Jul 05 '11 at 08:58
  • I thought there is only one thread dealing with the messages and thus the queue. Since the only function, at least I think it is this way, which is called by multiple threads is transmitMessage and transmitMessage delegates with: io_service.post(boost::bind(&MessageService::do_write, this, agentMessage)); Which I thought is only executed by the one thread launched in the constructor. But maybe I am wrong. – Konrad Reiche Jul 05 '11 at 09:03
  • `transmitMessage` is not called by multiple threads, you have only a single thread dispatching on the io service. I can't see anything wrong with the code above (as long as you only ever call `write` from another thread - say main thread) - I would check the receiver code. – Nim Jul 05 '11 at 09:12
  • Oh I meant `write` yes, this function is called from the main thread and maybe other threads, too. – Konrad Reiche Jul 05 '11 at 09:15
  • in that case it's fine because you are `post`ing a handler to the io service to actually `push_back`. I'd hazard then that the problem lies in the receiver code. – Nim Jul 05 '11 at 09:23
  • @Nim: The receiver code is Java and a lot simple, at least to my mind, but yes - this is still possible. – Konrad Reiche Jul 05 '11 at 09:27
  • You can always capture the network traffic and analyze it to determine whether the packets on the wire are correct or not. – David Rodríguez - dribeas Jul 05 '11 at 09:31
  • I think std::ostream os; created on the stack is not correct ! You should create it on the heap. See http://stackoverflow.com/questions/5344809/boost-asio-async-send-question and http://comments.gmane.org/gmane.comp.lib.boost.asio.user/2161 Make the stream a member variable and pls let me know if this fixes your problem. – O.C. Jul 05 '11 at 11:38
  • 1
    I think I was wrong. Creating boost::asio::streambuf on the stack seems incorrect to me !. – O.C. Jul 05 '11 at 11:46
  • What leads you to believe this is a problem with `boost::thread`? – Sam Miller Jul 05 '11 at 21:34

1 Answers1

2

this method seems dubious to me

void MessageService::transmitMessage(AgentMessage agentMessage) {
    ++i;

    boost::asio::streambuf b;
    std::ostream os(&b);

    ZeroCopyOutputStream *raw_output = new OstreamOutputStream(&os);
    CodedOutputStream *coded_output = new CodedOutputStream(raw_output);

    coded_output->WriteVarint32(agentMessage.ByteSize());
    agentMessage.SerializeToCodedStream(coded_output);

    delete coded_output;
    delete raw_output;

    boost::system::error_code ignored_error;

    boost::asio::async_write(socket, b.data(), boost::bind(
            &MessageService::handle_write, this,
            boost::asio::placeholders::error));
}

You appear to be serializing an AgentMessage (which should be passed via const reference btw) into a streambuf. Yet this serialized data does is not guaranteed to exist until the async_write completion handler is invoked, which is explicitly described in the async_write documentation

buffers

One or more buffers containing the data to be written. Although the buffers object may be copied as necessary, ownership of the underlying memory blocks is retained by the caller, which must guarantee that they remain valid until the handler is called.

to solve this behavior, ensure the buffer remains in scope until the completion handler is invoked. One way to do this is to pass the buffer as an argument to your bounded completion handler:

boost::asio::async_write(socket, b.data(), boost::bind(
            &MessageService::handle_write, this,
            boost::asio::placeholders::error,
            coded_output
            // ^^^ buffer goes here
            ));

then delete it from within the completion handler. I'd suggest you also look at using a shared_ptr instead of naked pointers as well.

Sam Miller
  • 23,808
  • 4
  • 67
  • 87
  • So since async_write returns immediately, but it is not known when async_write completes and transmitMessage is left, it might be uncertain what happens to the allocated buffers? What would you to suggest to counter this issue? – Konrad Reiche Jul 05 '11 at 21:19
  • @platzhirsch the time when `async_write` completes is well known, this is when your completion handler is invoked. I've updated my answer with one possible solution. – Sam Miller Jul 05 '11 at 21:33