6

I am wishing to collect the body of an HTTP request, including when the page redirects to elsewhere. Clearly, I can use non-Fetch domain mechanisms such as Network.getResponseBody. That works fine for the "final" page in a chain of redirections, but cannot be used for the intermediate pages because Chrome appears to dump the content when going to the next redirection target.

So, I implemented Fetch.enable( { patterns: [ { requestStage: Response } ] } ) (using PHP, but the details of that are irrelevant, as you will see). No error is returned from this method call. After then doing a Page.navigate, I wait for a Fetch.requestPaused event which contains members requestId, responseStatusCode and responseHeaders and then send a Fetch.getResponseBody (using the requestId from the Fetch.requestPaused) and the response I get depends on what the actual response to the page itself was. So, for a 200, I get a response body (hurray), but for a 30x (301, 302 etc), I always get error code -32000 with the message "Can only get response body on requests captured after headers received". Now, issuing that error message is inconsistent (in my view) with the Fetch.requestPaused event data, even if Chrome DevTools Protocol (CDP) was not intended to capture the bodies of HTTP redirected pages. By the way, pages with content triggered redirection (via a META element or JavaScript) are captured okay, I assume because they return a 200 status code.

So, is the issue in the sequence of calls I'm following or in the error message returned by Fetch.getResponseBody and am I correctly assuming CDP was not intended to capture the bodies of documents in a redirection chain (apart from the last one, obviously)?

Mark Bradley
  • 500
  • 5
  • 12
  • getResponseBody apparently wants the request to be in a paused state but considers a redirected request as a different request (technically it is different AFAIK). If there's no way to pause on redirect automatically then it definitely looks like a bug in CDP or an architectural deficiency. – wOxxOm Dec 17 '20 at 15:02
  • Just noticed that in Chrom(ium) that if I open DevTools and use the Network tab, the Response preview pane shows "no response data" (or a similar message) - that supports the idea that it's an architectural deficiency. Interestingly, Firefox did something similar.Looks like I'll have to do a cURL or similar - shame, that's two requests for each intermediate page: the original in CDP and the second in cURL – Mark Bradley Dec 17 '20 at 15:47

1 Answers1

0

You need to continue the request on a 301/302 and let the browser follow it (there is no body in a redirect):

    if (
      params.responseStatusCode === 301 || params.responseStatusCode === 302
    ) {
      await this.#client.send('Fetch.continueRequest', {
        requestId,
      });
    } else {
      // get body here
      const responseCdp = await this.#client.send('Fetch.getResponseBody', {
        requestId,
      });
    
      await this.#client.send('Fetch.fulfillRequest', {
        requestId,
        responseCode: params.responseStatusCode,
        responseHeaders: params.responseHeaders,
        body: responseCdp.body,
      });
    }
AntonB
  • 2,724
  • 1
  • 31
  • 39
  • in my experience, although there *should* be no body in a redirect, there often is (it depends on the site owner / content and also the capabilities of the webserver - Apache, nginx, Tomcat, etc, etc). For my purposes, it is useful to capture that transient content as well as the "final" content. Thanks for the code snippet. – Mark Bradley Apr 29 '23 at 09:22