3

I need to pull data from roughly 6000 pages of a website. After doing some research, I decided to give WinHTTP a shot. I was able to get this working, however I was doing things synchronously, so it took a while to complete. I am now attempting to use WinHTTP asynchronously, but I've hit a roadblock. I searched around for a number of tutorials and examples, but I could only find the MSDN documentation, which seems overly complex for what I'm doing. As mentioned, I couldn't find many resources, so I went ahead and gave it a shot:

std::string theSource = "";
char * httpBuffer;
DWORD dwSize = 1;
DWORD dwRecv = 1;

HINTERNET hOpen = 
           WinHttpOpen
           (
               L"Example Agent", 
               WINHTTP_ACCESS_TYPE_NO_PROXY, 
               NULL, 
               NULL, 
               WINHTTP_FLAG_ASYNC
           );

WINHTTP_STATUS_CALLBACK theCallback = 
           WinHttpSetStatusCallback
           (
               hOpen, 
               (WINHTTP_STATUS_CALLBACK) HttpCallback,
               WINHTTP_CALLBACK_FLAG_ALL_NOTIFICATIONS,
               NULL
           );

HINTERNET hConnect = 
           WinHttpConnect
           (
               hOpen, 
               L"example.org",
               INTERNET_DEFAULT_HTTPS_PORT, 
               0
           );

HINTERNET hRequest = NULL;

BOOL allComplete = false;

int theRequest = 1;


while (!allComplete)
{
    if (theRequest == 1)
    {
        hRequest = WinHttpOpenRequest
                   (
                       hConnect,
                       L"GET", 
                       L"example.html",
                       0,
                       WINHTTP_NO_REFERER, 
                       WINHTTP_DEFAULT_ACCEPT_TYPES, 
                       WINHTTP_FLAG_SECURE
                   );


        WinHttpSendRequest
        (
            hRequest, 
            WINHTTP_NO_ADDITIONAL_HEADERS, 
            0, 
            WINHTTP_NO_REQUEST_DATA, 
            0, 
            0, 
            0
        );
    }

    else if (theRequest == 2)
    {
        WinHttpReceiveResponse(hRequest, NULL);
    }

    else if (theRequest == 3)
    {
        WinHttpQueryHeaders
        (
            hRequest, 
            WINHTTP_QUERY_RAW_HEADERS_CRLF, 
            WINHTTP_HEADER_NAME_BY_INDEX, 
            NULL, 
            &dwSize, 
            WINHTTP_NO_HEADER_INDEX
        );

        WCHAR * headerBuffer = new WCHAR[dwSize/sizeof(WCHAR)];

        WinHttpQueryHeaders
        (
            hRequest, 
            WINHTTP_QUERY_RAW_HEADERS_CRLF, 
            WINHTTP_HEADER_NAME_BY_INDEX, 
            headerBuffer, 
            &dwSize, 
            WINHTTP_NO_HEADER_INDEX
        );

        delete [] headerBuffer;

        dwSize = 1;

        while (dwSize > 0)
        {
            if (!WinHttpQueryDataAvailable(hRequest, &dwSize))
            {
                break;
            }

            httpBuffer = new char[dwSize + 1];

            ZeroMemory(httpBuffer, dwSize + 1);

            if (!WinHttpReadData(hRequest, httpBuffer, dwSize, &dwRecv))
            {
                std::cout << "WinHttpReadData() - Error Code: " << GetLastError() << "\n";
            }

        else
        {
            theSource = theSource + httpBuffer;
        }

        delete [] httpBuffer;

        // Parse the source for the data I'm looking for.

        break;

    }
}

Below is my callback function:

void CALLBACK HttpCallback(HINTERNET hInternet, DWORD * dwContext, DWORD dwInternetStatus, void * lpvStatusInfo, DWORD dwStatusInfoLength)
{
    switch (dwInternetStatus)
    {
        default:
            std::cout << dwInternetStatus << "\n";
            break;

        case WINHTTP_CALLBACK_STATUS_HANDLE_CREATED:
            std::cout << "Handle created.\n";
            theRequest = 1;
            break;

        case WINHTTP_CALLBACK_STATUS_REQUEST_SENT:
            std::cout << "Request sent.\n";
            theRequest = 2;
            break;

        case WINHTTP_CALLBACK_STATUS_RESPONSE_RECEIVED:
            std::cout << "Response received.\n";
            theRequest = 3;
            break;

    }
}

Note: I've only provided this section of my code since it's the part that pertains to my question/problem. I apologize if a variable declaration is missing.

The above code works for me and does in fact get the desired information I'm looking for, but only for a single page. After getting to this point, I realized I didn't have any idea about what to do when it came to making multiple requests with this method. Again, searching didn't turn up with much besides the MSDN articles, which as far as I can tell, aren't examples that make multiple requests at once. Additionally, the while loop I'm using to open/send/etc. the requests based on theRequest's value seems like a terrible way of doing this. I'd appreciate any other advice to improve my code as well.

In general, here's a summary of my problem: I need to make about 6000 GET requests using WinHTTP asynchronously. I'm not entirely sure how to do this because I'm new to WinHTTP, so I'm looking for the most basic (or possibly efficient) way to work with multiple asynchronous requests.

Jamal Winters
  • 33
  • 2
  • 5
  • Did you see this MSDN article: [Asynchronous WinHTTP](http://msdn.microsoft.com/en-us/magazine/cc716528.aspx) ? And maybe this single-threaded, multi-socket and easy to understand [Perl API `HTTP::Async`](https://metacpan.org/module/HTTP::Async) can provide you some inspiration on how to proceed. – Lumi May 19 '12 at 14:11

1 Answers1

2

You are to repeat what you are doing in while (!allComplete) { ... } and shoot more requests this way. You can reuse hConnect but you need to do WinHttpOpenRequest for every resource request.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
  • Could you elaborate or provide some example code? The latter would be the most helpful for me if possible. I'm failing to see how I can incorporate additional requests into my loop and still monitor them appropriately. I won't be able to depend on the values of theRequest for querying headers or reading data since each request will be modifying that value via the callback function at any given time based on the current code. – Jamal Winters May 19 '12 at 09:18
  • 1
    `theRequest` cannot be global in this scenario, it will be specific to particular `hRequest`. So you will need a sort of collection of structs with `theRequest` + `hRequest` values. – Roman R. May 19 '12 at 11:12