2

When I Run my node API in headless: false mode then it could open a browser instance and I can get the data. but when I use headless: true then it shows access denied and doesn't scrape data. My code below.

(async () => {
const browser = await puppeteer.launch({
  headless: false
});
const page = await browser.pages();
await page[0].goto(url);

const my = await page[0].evaluate(() => {

  let title = document.getElementsByClassName('p-name')[0].innerHTML.trim();
  return title;
});
console.log(my);
res.status(200).json(my);
await browser.close();})(); 

I search for a solution and found this one (Puppeteer opens an empty tab in non-headless mode). This unable to solve my problem completely. This helped me to close the additional browsers that open. Thanks in advance.

This Url I wanna scrape is : https://www.macys.com/shop/product/nike-big-boys-sportswear-t-shirt?ID=11252136&CategoryID=6086&swatchColor=Dark%20Gray%20Heather

Nazmul Hosen
  • 361
  • 2
  • 10

1 Answers1

4

I think you have to set user-agent.

await page[0].setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36")

Below code worked for me.

const puppeteer = require("puppeteer")
async function test () {
const browser = await puppeteer.launch({
  headless: true
});
const page = await browser.pages();
await page[0].setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36")
await page[0].goto("https://www.macys.com/shop/product/nike-big-boys-sportswear-t-shirt?ID=11252136&CategoryID=6086&swatchColor=Dark%20Gray%20Heather");
await page[0].screenshot({path: 'screenshot.png'});
const my = await page[0].evaluate(() => {
  
  let title = document.getElementsByClassName('p-name')[0].innerHTML.trim();
  return title;
});
console.log(my);

await browser.close();
}; 

test();
madhu P
  • 101
  • 1
  • 7
  • Thanks its working. How here one question have that Why useragents need to add ? – Nazmul Hosen May 09 '21 at 03:49
  • This link might help you to get more understanding on puppeteer headless mode https://dev.to/sonyarianto/user-agent-string-difference-in-puppeteer-headless-and-headful-4aoh#:~:text=How%20to%20set%20User%20Agent,Here%20is%20the%20code%20sample.&text=File%20puppeteer_set_user_agent.js-,const%20puppeteer%20%3D%20require('puppeteer')%3B%20(async%20(),chrome%20const%20browser%20%3D%20await%20puppeteer. – madhu P May 10 '21 at 05:13