Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable

Question

I tried to parse the data which was generated by chunked transfer encoding in a Rest API ,I did see the data has value when I tried to print the value in a string and I thought it should be working,but when I tried to assign the value to the file, the file is totally unreadable, the code below I used boost library and I gonna elaborate my thoughts in the code , we gonna get started from the response portion of my code, I have no idea what wrong I have done

   // Send the request.
    boost::asio::write(socket, request);

    // Read the response status line. The response streambuf will automatically
    // grow to accommodate the entire line. The growth may be limited by passing
    // a maximum size to the streambuf constructor.
    boost::asio::streambuf response;
    boost::asio::read_until(socket, response, "\r\n");

    // Check that response is OK.
    std::istream response_stream(&response);
    std::string http_version;
    response_stream >> http_version;
    unsigned int status_code;
    response_stream >> status_code;
    std::string status_message;
    std::getline(response_stream, status_message);
    if (!response_stream || http_version.substr(0, 5) != "HTTP/")
    {
        //std::cout << "Invalid response\n";
        return 9002;
         
    }
    if (status_code != 200)
    {
        //std::cout << "Response returned with status code " << status_code << "\n";
        return 9003;
    }
    
    // Read the response headers, which are terminated by a blank line.
    boost::asio::read_until(socket, response, "\r\n\r\n");

    // Process the response headers.
    //this portion of code I tried to parse the file name in the header of response which the file name is in the  content-disposition of header
    std::string header;
    std::string fullHeader = "";
    string zipfilename="", txtfilename="";
    bool foundfilename = false;
    while (std::getline(response_stream, header) && header != "\r")
    {
        fullHeader.append(header).append("\n");
        std::transform(header.begin(), header.end(), header.begin(),
            [](unsigned char c){ return std::tolower(c); });
        string containstr = "content-disposition";
        string containstr2 = "filename";
        string quotestr = "\"";
        if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
        {
            int countquotes = 0;
            bool foundquote = true;
            
            std::size_t startpos = 0, beginpos, endpos;
            while (foundquote)
            {
                
                std::size_t myfound = header.find(quotestr, startpos);
                if (myfound != std::string::npos)
                {
                    if (countquotes % 2 == 0)
                        beginpos = myfound;
                    else
                    {
                        endpos = myfound;
                        foundfilename = true;
                    }

                    startpos = myfound + 1;
                    
                }
                else
                   foundquote = false;

                countquotes++;
            }

            if (endpos > beginpos && foundfilename)
            {
                size_t zipfileleng = endpos - beginpos;
                zipfilename = header.substr(beginpos+1, zipfileleng-1);
                txtfilename = header.substr(beginpos+1, zipfileleng-5);
            }
            else
                return 9004;

        }
    }

    if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
        return 9005;

     //when the zipfilename has been found, we gonna get the data from the body of response, due to the response was  chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector

      // Write whatever content we already have to output.
    std::string fullResponse = "";
    if (response.size() > 0)
    {
        std::stringstream ss;
        ss << &response;
        fullResponse = ss.str();
     
    
    }
    //tried split the entire body of response into a vector<string>

     vector<string> allresponsedata;
    split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
    
    //tried to merge the data of response
    string zipfiledata;
    int myindex = 0;
    for (auto &x : allresponsedata) {
        std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x

        if (myindex % 2 != 0)
        {
            zipfiledata = zipfiledata + x;//tried to accumulate the datas
        }


        myindex++;
    }
    
    //tried to write the data into a file
    std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
    zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
    zipfilestream.close();

    //afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though

I even tried something else ways like this slow http client based on boost::asio - (Chunked Transfer) ,but this way is not working as well ,VS says

  1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
        argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)

it just NOT able to compile in the line which is

size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);

even I have read the example of asio::transfer_exactly, there's no exactly example like this though https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html

any idea?

Any reason why you didn't use [Boost Beast that supports chunked encoding](https://stackoverflow.com/questions/66756691/i-tried-to-download-a-file-with-boost-asio-but-it-doesnt-work-it-just-looks-l#comment118011473_66759719)? — sehe, Mar 31 '21 at 16:01

sehe · Accepted Answer · 2021-04-01T12:02:42.193

I don't see you read the format correctly: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format

You need to read the chunk length (in hex) and any optional chunk extensions before accumulating the full response body.

It needs to be done before, because the sequence \r\n that you split on can easily appear inside the chunk data.

Again, I recommend to just use Beast's support, making it all a simple

 http::response<http::string_body> response;
 boost::asio::streambuf buf;
 http::read(socket, buf, response);

And you will have the headers fully parsed, interpreted (including Trailer headers!) and the content in response.body() as a std::string.

It will do the right thing even if the server doesn't use chunked encoding or combines with different encoding options.

There's simply no reason to reinvent the wheel.

Full Demo

This demonstrates with the Chunked Encoding test url from https://jigsaw.w3.org/HTTP/:

#include <boost/process.hpp>
#include <boost/beast.hpp>
#include <iostream>
namespace http = boost::beast::http;
using boost::asio::ip::tcp;

int main() {
    http::response<http::string_body> response;

    boost::asio::io_context ctx;
    tcp::socket socket(ctx);

    connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));

    http::write(
            socket,
            http::request<http::empty_body>(
                http::verb::get, "/HTTP/ChunkedScript", 11));

    boost::asio::streambuf buf;
    http::read(socket, buf, response);

    std::cout << response.body() << "\n";
    std::cout << "Effective headers are:" << response.base() << "\n";
}

Printing

This output will be chunked encoded by the server, if your client is HTTP/1.1
Below this line, is 1000 repeated lines of 0-9.
-------------------------------------------------------------------------
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
...996 lines removed ...
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890

Effective headers are:HTTP/1.1 200 OK
cache-control: max-age=0
date: Wed, 31 Mar 2021 20:09:50 GMT
transfer-encoding: chunked
content-type: text/plain
etag: "1j3k6u8:tikt981g"
expires: Wed, 31 Mar 2021 20:09:49 GMT
last-modified: Mon, 18 Mar 2002 14:28:02 GMT
server: Jigsaw/2.3.0-beta3

@BenVoigt You're missing out on a big thing: https://www.boost.org/doc/libs/1_75_0/libs/beast/doc/html/index.html - Since boost 1.66.0 already! — sehe, Mar 31 '21 at 18:08
I'm assuming that your example snippet needs some #includes not present in the question? — Ben Voigt, Mar 31 '21 at 19:17
And (from Beast example code) also a `namespace http = beast::http;` With that it becomes clear that the example actually uses Beast. — Ben Voigt, Mar 31 '21 at 19:24
@BenVoigt I was taking a shortcut. I linked a complete example from the comment at an earlier answer. (See [here](https://stackoverflow.com/questions/66889515/tried-to-parse-chunked-transfer-encoding-its-not-working-though-the-file-which/66891423?noredirect=1#comment118241386_66889515)). Really, how to do it with Beast in no wau answers the question here, so it's a by-line here. — sehe, Mar 31 '21 at 20:01
@BenVoigt That said, it wasn't too hard once I found a suitable [online test page](https://jigsaw.w3.org/HTTP/) to use. Added — sehe, Mar 31 '21 at 20:10
@Ken Does that mean you'll be using Beast? If so, what should I have said differently the other time to make you consider that? It seems that would have saved a lot of time. — sehe, Apr 01 '21 at 01:55

Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable

1 Answers1

Full Demo