1

I have response parser boost::beast::http::parser<false, boost::beast::http::buffer_body>. As i understand, buffer_body means that the response body data should be stored in the user-provided buffer. But, when i set the chunk callback on the parser using on_chunk_body method of parser, it seems like that the parser does not use provided buffer. In this case, it also works when no buffer provided

So, i need to understand how http parser manages memory when receives chunk? It use some internal buffer or what?

It seems like the parser uses provided buffer only for non-chunked response. If yes, it is correct to provide no buffer for chunked responses?

  • Yeah using `buffer_body` is notoriously tricky to get right. I **think** I have some answers up on this site or the Beast issue-tracker. I'll find it later to-night. – sehe Mar 21 '23 at 16:13
  • @sehe for me buffer_body works when i read http request body or http non-chunked response. In this cases, i get error::need_buffer for small buffers and reset the buffer to continue to read. But in chunked cases, i am not get this error despite the buffer is very small (2 bytes). but documentation says i should initialize the buffer and i can`t understand why i need to use it for chunkes responses if parser does not need that buffer. – Samuel Smith Mar 21 '23 at 16:19
  • Sorry, you're saying words, and I'm sure you mean something, but I cannot figure out what you mean. Luckily, we both speak code, so, I'll just [show, don't tell](https://stackoverflow.com/a/75807592/85371). When you digest that answer, you will no doubt understand what you were doing wrong/differently. – sehe Mar 22 '23 at 00:54

1 Answers1

1

Beast supports chunked encoding. You do not need to deal with it. Lets demonstrate that by downloading a chunked response from httpbin.org:

First With vector_body

To remove the confusing part:

Live On Coliru

void using_vector_body() {
    tcp::socket conn = send_get();

    http::response<http::vector_body<uint8_t>> res;
    beast::flat_buffer buf;
    read(conn, buf, res);

    std::cout << "response: " << res.base() << "\n";

    std::span body = res.body();
    fmt::print("body, {} bytes: {::0x} ... {::0x}\n", body.size(), body.first(10), body.last(10));

    auto checksum = reduce(begin(body), end(body), 0ull, std::bit_xor<>{});
    fmt::print("body checksum: {:#0x}\n", checksum);
}

Prints e.g.

response: HTTP/1.1 200 OK
Date: Wed, 22 Mar 2023 00:43:25 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

body, 2000 bytes: [39, c, 8c, 7d, 72, 47, 34, 2c, d8, 10] ... [fd, 18, b0, c3, a3, d5, d1, 4c, 99, c0]
body checksum: 0xa1

Converting To buffer_body

We need to use the parser interface because it will drive the body-reader, and we need to monitor is_done() on the parser.

For good style, we may replace the initial

read(conn, buf, p, ec);

with the more intentional:

read_header(conn, buf, p, ec);

We will receive need_buffer errors, so we need to deal with them. Then, we need to repeatedly set the body reader value to our buffer, and see what was actually decoded.

NOTE Do not use the returned bytes_transferred from the [async_]read call here, because that will include everything from the wire, including the chunk header and trailer headers.

The interface to calculate the decoded buffer bytes is atrocious very unfriendly. But it is what you need.

Without further ado:

void using_buffer_body() {
    tcp::socket conn = send_get();

    http::response_parser<http::buffer_body> p;
    auto& res      = p.get(); // convenience shorthands
    auto& body_val = res.body();

    beast::flat_buffer buf;
    error_code ec;
    read_header(conn, buf, p, ec);
    //read(conn, buf, p, ec);

    if (ec && ec != http::error::need_buffer) // expected
        throw boost::system::system_error(ec);

    assert(p.is_header_done());
    std::cout << "\n---\nresponse headers: " << res.base() << std::endl;

    size_t checksum = 0;
    size_t n = 0;

    while (!p.is_done()) {
        std::array<uint8_t, 512> block;
        body_val.data = block.data();
        body_val.size = block.size();
        read(conn, buf, p, ec);

        if (ec && ec != http::error::need_buffer) // expected
            throw boost::system::system_error(ec);

        auto curr = block.size() - body_val.size;
        n += curr;

        std::cout << "parsed " << curr << " body bytes\n";

        for (auto b : std::span(block).first(curr))
            checksum ^= b;
    }

    fmt::print("body, {} bytes streaming decoded, chunked? {}\n", n, p.chunked());
    fmt::print("body checksum: {:#0x}\n", checksum);
}

Full Live Demo

The demo confirms that both methods result in the same body length with the same checksum:

Live On Coliru

#include <boost/beast.hpp>
#include <fmt/ranges.h>
#include <iostream>
#include <span>
namespace net   = boost::asio;
namespace beast = boost::beast;
namespace http  = beast::http;
using boost::system::error_code;
using net::ip::tcp;

tcp::socket send_get() {
    net::system_executor ex;
    tcp::socket          s(ex);
    connect(s, tcp::resolver(ex).resolve("httpbin.org", "http"));

    http::request<http::empty_body> req{http::verb::get, "/stream-bytes/2000?seed=42", 11};
    req.set(http::field::host, "httpbin.org");
    write(s, req);

    return s;
}

void using_vector_body() {
    tcp::socket conn = send_get();

    http::response<http::vector_body<uint8_t>> res;
    beast::flat_buffer buf;
    read(conn, buf, res);

    std::cout << "response: " << res.base() << "\n";

    std::span body = res.body();
    size_t const n = body.size();
    fmt::print("body, {} bytes: {::0x} ... {::0x}\n", n, body.first(10), body.last(10));

    auto checksum = reduce(begin(body), end(body), 0ull, std::bit_xor<>{});
    fmt::print("body checksum: {:#0x}\n", checksum);
}

void using_buffer_body() {
    tcp::socket conn = send_get();

    http::response_parser<http::buffer_body> p;
    auto& res      = p.get(); // convenience shorthands
    auto& body_val = res.body();

    beast::flat_buffer buf;
    error_code ec;
    read_header(conn, buf, p, ec);
    //read(conn, buf, p, ec);

    if (ec && ec != http::error::need_buffer) // expected
        throw boost::system::system_error(ec);

    assert(p.is_header_done());
    std::cout << "\n---\nresponse headers: " << res.base() << std::endl;

    size_t checksum = 0;
    size_t n = 0;

    while (!p.is_done()) {
        std::array<uint8_t, 512> block;
        body_val.data = block.data();
        body_val.size = block.size();
        read(conn, buf, p, ec);

        if (ec && ec != http::error::need_buffer) // expected
            throw boost::system::system_error(ec);

        size_t decoded = block.size() - body_val.size;
        n += decoded;

        std::cout << "parsed " << decoded << " body bytes\n";

        for (auto b : std::span(block).first(decoded))
            checksum ^= b;
    }

    fmt::print("body, {} bytes streaming decoded, chunked? {}\n", n, p.chunked());
    fmt::print("body checksum: {:#0x}\n", checksum);
}

int main() {
    using_vector_body();
    using_buffer_body();
}

Prints e.g.

response: HTTP/1.1 200 OK
Date: Wed, 22 Mar 2023 00:52:32 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

body, 2000 bytes: [39, c, 8c, 7d, 72, 47, 34, 2c, d8, 10] ... [fd, 18, b0, c3, a3, d5, d1, 4c, 99, c0]
body checksum: 0xa1

---
response headers: HTTP/1.1 200 OK
Date: Wed, 22 Mar 2023 00:52:32 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

parsed 512 body bytes
parsed 512 body bytes
parsed 512 body bytes
parsed 464 body bytes
body, 2000 bytes streaming decoded, chunked? true
body checksum: 0xa1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you! But what about async reading with http::parser::on_chunk_body callback? When i read http using async_ functions (http::async_read_header for headers and http::async_read for body) with on_chunk_body callback, In this case, callback called every time when new chunk is available. It is ok, but in this case it seems like beast does not use user-provided buffer. For example, in your synchronous case, you will get an error::need_buffer if your buffer is small. But in case, which i described, there is no error::need_buffer error despite the buffer is very small (char buf[2]). – Samuel Smith Mar 22 '23 at 07:56
  • I want to know, why in this case i am not get error::need_buffer error despite the buffer is very small and how beast manage memory in this case? – Samuel Smith Mar 22 '23 at 07:56
  • In other words, can you please provide example, in which the async_read_header and async_read used for reading http with on_chunk_body callback provided where the body of parser have type buffer_body, and show how i can get error::need_buffer error in this case? – Samuel Smith Mar 22 '23 at 08:04
  • I get error::need_buffer only when on_chunk_body is not provided and not understand why i am not get error::need_buffer when on_chunk_body callback is provided – Samuel Smith Mar 22 '23 at 08:23
  • 1
    The whole point is that you do not need the callbacks when using `buffer_body`. Of course, if you **show** your code, I can see what it is doing. I think I've written enough working code, so let's make it your turn? If you are going to ask "how to use the chunk callbacks", please make that a separate question. I'm happy to dive in there too. – sehe Mar 22 '23 at 10:12
  • 1
    sure, i am already created a separate question but this question stucks in the staging ground https://stackoverflow.com/staging-ground/75811628 – Samuel Smith Mar 22 '23 at 11:37
  • Re: "For example, in your synchronous case" - that's missing the forest for the trees. Sync/async makes no difference: http://coliru.stacked-crooked.com/a/d61d55c22e239784 If you can write it sync, you can write it async. It's the exact same code, just a whole lot harder to read. – sehe Mar 22 '23 at 11:44