-2

I am aware of the Pyppeteer library and Pyppeteer Stealth, but the problem with them is that the website that I am trying to scrape information from detects Pyppeteer Stealth (Python transplant of Puppetteer) and blocks it. The original Puppetteer Stealth used on node JS does work fine on that website, however, I would much rather create this scraper on Python since I am much more familiar with it.

Which other stealthy and up to date headless browser automation tools are available?

All I will need it for is grabbing the HTML content and parsing it through Beautiful Soup. Unfortunately, the requests and requests-html library also do not work on this website.

techxz
  • 1
  • 2

1 Answers1

0

If you don't care about the automation part of the software that much I would just recommend looking into Scrapy (and Scrapy Splash if you need js to be rendered which is why I assume you want to use Pyppeteer in the first place) combined with the use of some basic tactics to not get caught as a bot such as user-agent rotation and proxy rotation.

This is the tactic I am using too to make a scraper for similarweb.com at the moment.

Beenyaa
  • 11
  • 4