13

How can I get 3rd-party cookies from a website using Puppeteer?

For first party, I know I can use:

await page.cookies()
Grant Miller
  • 27,532
  • 16
  • 147
  • 165
Piotr Wu
  • 1,362
  • 3
  • 14
  • 31
  • You probably can't. JS doesn't usually have that kind of access. – Cerbrus May 09 '18 at 12:11
  • There must be some way. For example cookiepedia extract all cookies from any page – Piotr Wu May 09 '18 at 12:15
  • What makes you think cookiepedia is reading _your_ cookies? – Cerbrus May 09 '18 at 12:17
  • Not my cookies. It just read page's cookies. Probably it open provided page and grab all cookies, 1st and 3rd party – Piotr Wu May 09 '18 at 12:21
  • What makes you think it's using JavaScript to do that? – Cerbrus May 09 '18 at 12:22
  • It uses browser for sure, so you can use headless browser, and control it via for example puppeteer or something other – Piotr Wu May 09 '18 at 12:24
  • @PiotrWójcik Interesting — could you share a URL on Cookiepaedia where it reads your 3d-party cookies? – Vaviloff May 10 '18 at 08:46
  • In fact I found 2 ways of doing it, and I can get all cookies. Now I need to refactor it, and find some spare time and I will provide solution here – Piotr Wu May 11 '18 at 09:38
  • 1
    @PiotrWójcik, would you share your research? I'm curious about the 2nd way you mentioned. – Vaviloff May 22 '18 at 14:45
  • 2
    @Vaviloff F*k me I totally forgot about, sorry. Anyway, this code is gone for long time, but I found some parts of it, so it may be not complete: 1. Puppeteer creates in tested directory: ./Default/Cookies file which is simply sqlite database and all cookies are there. – Piotr Wu Jul 11 '18 at 10:48

3 Answers3

33

I was interested to know the answer so have found a solution too, it works for the current versions of Chromium 75.0.3765.0 and puppeteer 1.15.0 (updated May 2nd 2019).

Using internal puppeteer page._client methods we can make use of Chrome DevTools Protocol directly:

(async() => {
  const browser = await puppeteer.launch({});
  const page = await browser.newPage();
  await page.goto('https://stackoverflow.com', {waitUntil : 'networkidle2' });

  // Here we can get all of the cookies
  console.log(await page._client.send('Network.getAllCookies'));

})();

In the object returned there are cookies for google.com and imgur.com which we couldn't have obtained with normal browser javascript:

3d-party cookies!

Vaviloff
  • 16,282
  • 6
  • 48
  • 56
12

You can create a Chrome DevTools Protocol session on the page target using target.createCDPSession(). Then you can send Network.getAllCookies to obtain a list of all browser cookies.

The page.cookies() function will only return cookies for the current URL. So we can filter out the current page cookies from all of the browser cookies to obtain a list of third-party cookies only.

const client = await page.target().createCDPSession();
const all_browser_cookies = (await client.send('Network.getAllCookies')).cookies;
const current_url_cookies = await page.cookies();
const third_party_cookies = all_browser_cookies.filter(cookie => cookie.domain !== current_url_cookies[0].domain);

console.log(all_browser_cookies); // All Browser Cookies
console.log(current_url_cookies); // Current URL Cookies
console.log(third_party_cookies); // Third-Party Cookies
Grant Miller
  • 27,532
  • 16
  • 147
  • 165
  • Hello @Grant Miller - I was wondering how do we pass a specific web page to this, to acquire it's cookies for inspection? – joe hoeller Mar 18 '20 at 14:36
1
  const browser = await puppeteer.launch({});
  const page = await browser.newPage();
  await page.goto('https://www.stackoverflow.com/', {waitUntil : 'networkidle0' });
  // networkidle2, domcontentloaded, load are the options for wai until
  // Here we can get all of the cookies
  var content = await page._client.send('Network.getAllCookies');

  console.log(JSON.stringify(content, null, 4));
mujuonly
  • 11,370
  • 5
  • 45
  • 75
  • What does this add to the [top answer](https://stackoverflow.com/a/50290081/6243352)? This doesn't work on Pupp 19.1.0: `TypeError: page._client.send is not a function` – ggorlen Jan 13 '23 at 22:57