2

I'm using asio::async_read_until with '\n' delimiter to support a TCP client that fetches character data from a server. This server continuously sends '\n' terminated lines; precisely, it can write at once either single lines or a concatenated string of multiple lines.

From the doc, I understand that asio::async_read_until could read:

  • One '\n' terminated line, like "some_data\n". This is the simplest case, handled with a call the std::getline on the stream associated with the asio::streambuf
  • One '\n' terminated line plus the beginning of a next line, like "some_data1\nbla". This can be handled with a std::getline; the rest of the second line will be handled at the next completion handler call.
  • Many lines; in this case, the newly read data could contain 2 or more '\n'. How can I know how many std::getline calls I should do, knowing that I don't want to risk calling std::getline on an incomplete line (which I will eventually get in a future packet)? Should I peek at the stream buffer to check the existence of multiple '\n'? Is it even possible without doing many copies?
mamahuhu
  • 311
  • 2
  • 10
  • For the second case, a call to `std::getline` would fetch "some_data1" and another call would fetch "bla" and set the eof bit. The thing is that I want to leave "bla" in the streambuf. – mamahuhu Nov 26 '15 at 12:59

3 Answers3

2

from the documentation here:

http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/async_read_until/overload1.html

If the stream buffer already contains a newline, the handler will be invoked without an async_read_some operation being executed on the stream.

For this reason, when your handler executes you must execute no more than one getline(). Once getline has returned and you have finished processing, simply call async_read_until again from the handler.

example:

void handler(const boost::system::error_code& e, std::size_t size)
{
  if (e)
  {
    // handle error here
  }
  else
  {
    std::istream is(&b);
    std::string line; 
    std::getline(is, line);
    do_something(line)
    boost::asio::async_read_until(s, b, '\n', handler);
  }
}

// Call the async read operation
boost::asio::async_read_until(s, b, '\n', handler);
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • It seems I completely overlooked this remark in the documentation. Thanks a lot; my problem is solved. – mamahuhu Nov 26 '15 at 13:26
  • 2
    I think the style of documentation in `boost::asio` is incredibly terse and to the point. The author clearly believes that only the absolute minimum number of words should be used to convey meaning. I think I had to read the documentation three times before I understood how asio works and how to use it properly. I think this is a shame because the library is beautiful, but the documentation makes it difficult to learn. – Richard Hodges Nov 26 '15 at 17:51
2

this answer relates to the accepted answer:

I'd highly recommand to call std::getline() in a loop and test the return value.

while (std::getline(is, line)) {
  ...
  do_something(line);
}

std::getline returns a reference to the istream reference, which can be implicitely converted to bool, indicating if the getline operation was really successful.

Why one shall do that:

  1. std::getline may fail, i.e. if the input stream has reached its limits, and no newline is present
  2. you may have more then one line inside asio's streambuf. If you blindly restart reading after processing just the first line, you may end up with exceeding memory limits on the streambuf (or have an ever growing streambuf).

Update 2017-08-23:

bytes_transferred actually gives you the position in the underlying buffer where the separator has been found. One can take advantage of that by simply upcasting the streambuf and create a string from that.

void client::on_read(const std::error_code &ec, size_t bytes_transferred) {

    if (ec) {
        return handle_error(ec);
    }

    std::string line(
        asio::buffer_cast<const char*>(m_rxbuf.data()),
        bytes_transferred
    );

    // todo: strip of trailing delimiter

    m_rxbuf.consume(bytes_transferred); // don't forget to drain

    handle_command(line); // leave restarting async_read_until to this handler
}

instead of copying data from the streambuf into the string, you can alternatively create a string_view from it, or replace the underlying streambuf by a std::string and chop off the bytes_transferred instead of consuming from the buffer.

Cheers, Argonaut6x

argonaut6x
  • 115
  • 1
  • 8
0

Updated: with a somewhat better approach.

IMHO, you are better off using async_read_some directly rather than the read until operation. This requires less operations overall and gives you better control over the buffer handling, and could reduce the amount of copies you have to make of the data. You could use the asio::streambuf implementation, but you could also do this using a vector<char>, for example:

vector<char> buffer(2048); // whatever size you want, note: you'll need to somehow grow this if message length is greater...
size_t content = 0; // current content

// now the read operation;
void read() {
  // This will cause asio to append from the last location
  socket.async_read_some(boost::asio::buffer(buffer.data() + content, buffer.size() - content), [&](.. ec, size_t sz) {
    if (ec) return; // some error
    // Total content in the vector
    content += sz;
    auto is = begin(buffer);
    auto ie = next(is, content); // end of the data region

    // handle all the complete lines.
    for (auto it = find(is, ie, '\n'); it != ie; it = find(is, ie, '\n')) {
      // is -> it contains the message (excluding '\n')
      handle(is, it);
      // Skip the '\n'
      it = next(it);
      // Update the start of the next message
      is = it;
    }
    // Update the remaining content
    content -= distance(begin(buffer), is);
    // Move the remaining data to the begining of the buffer
    copy(is, ie, begin(buffer));
    // Setup the next read
    read();
  });
}
Nim
  • 33,299
  • 2
  • 62
  • 101
  • The problem is that `std::getline` will stop its extraction if EOF is reached, and return the characters up to EOF. The data does not remain in the buffer: that's the whole issue. I have double-checked it with a simple node.js TCP server that sends the different string configurations I've described in my question. – mamahuhu Nov 26 '15 at 12:50
  • My oversight, read_until requires that you call again for the next line. IMHO, this is suboptimal - see updated answer. – Nim Nov 26 '15 at 13:41
  • it's almost never a better approach to use `xxx_read_some`. The composed operations automatically shield you from the complexities of socket comms. – Richard Hodges Nov 26 '15 at 17:53
  • For what it is worth, `async_read_until` has optimized internal scheduling over multiple individual `async_read_some` operations. – Tanner Sansbury Nov 26 '15 at 17:53
  • @TannerSansbury, be that as it may (though I'd be interested to know where in the code this is happening,) you still need to *schedule* another `read_until` call (when you could just as easily scan the buffer in the call back to see if there are other complete messages - and there by save yourself having to wait to process data you already have..) At the end of the day it's down to the use case - if you don't care, use composed operations - if you care, then `read_some` is the way forward... – Nim Nov 26 '15 at 23:20
  • ... so as the OP says, if there are indeed multiple messages received from a single "frame", the overhead of having to go through the scheduling of the next async operation to simply parse the data you already have (remember also that you end up having to do two scans through the data - in asio and in `getline()`) you have significantly more overhead with the composed operations. For toy applications - fine - for anything serious, not convinced... – Nim Nov 26 '15 at 23:25
  • @Nim The optimization occurs via the `asio_handler_is_continuation` hook. One does not have parse the data multiple times when using `read_until`, as the `read_until` operation provides the number of bytes in the input sequence up to and including the delimiter. I agree that it is down to the use case. However, I do not have enough experience with Asio to make a definitively prescription and use profiling to aid in my decision on a case-by-case basis. – Tanner Sansbury Nov 27 '15 at 03:03
  • @TannerSansbury, will look into this, cursory look indicates it's not limited to `read_until` ops, it's applicable for other operations, needs a little more documentation I think. Definitely agree with profiling! Fundamentally my point is that the `read_some` call is what it all boils down to - so stripping away extra layers is only going to result less operations. – Nim Nov 27 '15 at 08:47