13

I want to read the request cookie during a test written with the puppeteer. But I noticed that most of the requests I inspect have only referrer and user-agent headers. If I look at the same requests in Chrome dev tools, they have a lot more headers, including Cookie. To check it out, copy-paste the code below into https://try-puppeteer.appspot.com/.

const browser = await puppeteer.launch();
const page = await browser.newPage();

page.on('request', function(request) {
  console.log(JSON.stringify(request.headers, null, 2));
});

await page.goto('https://google.com/', {waitUntil: 'networkidle'});

await browser.close();

Is there a restriction which request headers you can and can not access? Is it a limitation of Chrome itself or puppeteer?

Thanks for suggestions!

Bardt
  • 695
  • 1
  • 8
  • 17
  • Also related - [Headers in Puppeteer are not same as in browser](https://stackoverflow.com/questions/62336825/headers-in-puppeteer-are-not-same-as-in-browser) – Sergey Geron Jan 28 '22 at 06:13

2 Answers2

15

I also saw this when I was trying to use Puppeteer to test some CORS behaviour - I found the Origin header was missing from some requests.

Having a look around the GitHub issues I found an issue which mentioned Puppeteer does not listen to the Network.responseReceivedExtraInfo event of the underlying Chrome DevTools Protocol, this event provides extra response headers not available to the Network.responseReceived event. There is also a similar Network.requestWillBeSentExtraInfo event for requests.

Hooking up to these events seemed to get me all the headers I needed. Here is some sample code which captures the data from all these events and merges it onto a single object keyed by request ID:

// Setup.
const browser = await puppeteer.launch()
const page = await browser.newPage()
const cdpRequestDataRaw = await setupLoggingOfAllNetworkData(page)

// Make requests.
await page.goto('http://google.com/')

// Log captured request data.
console.log(JSON.stringify(cdpRequestDataRaw, null, 2))

await browser.close()

// Returns map of request ID to raw CDP request data. This will be populated as requests are made.
async function setupLoggingOfAllNetworkData(page) {
    const cdpSession = await page.target().createCDPSession()
    await cdpSession.send('Network.enable')
    const cdpRequestDataRaw = {}
    const addCDPRequestDataListener = (eventName) => {
        cdpSession.on(eventName, request => {
            cdpRequestDataRaw[request.requestId] = cdpRequestDataRaw[request.requestId] || {}
            Object.assign(cdpRequestDataRaw[request.requestId], { [eventName]: request })
        })
    }
    addCDPRequestDataListener('Network.requestWillBeSent')
    addCDPRequestDataListener('Network.requestWillBeSentExtraInfo')
    addCDPRequestDataListener('Network.responseReceived')
    addCDPRequestDataListener('Network.responseReceivedExtraInfo')
    return cdpRequestDataRaw
}
Hugo
  • 334
  • 3
  • 10
2

That's because your browser sets a bunch of headers depending on settings and capabilities, and also includes e.g. the cookies that it has stored locally for the specific page.

If you want to add additional headers, you can use methods such as:

page.setExtraHTTPHeaders docs here.

page.setUserAgent docs here.

page.setCookies docs here.

With these you can mimic the extra headers that you see your Chrome browser dispatching.

tomahaug
  • 1,446
  • 10
  • 12
  • The point was to test if the browser can add cookies set in another response. If I add headers manually, I will only test if I added them manually) I can see them add to request headers in a usual Chrome during manual testing, but maybe there is a way to set headless Chrome to behave the same way? – Bardt Nov 06 '17 at 08:02