So in my web scraper function, I have the below lines of code:
let portList = [9050, 9052, 9053, 9054, 9055, 9056, 9057, 9058, 9059, 9060];
let spoofPort = portList[Math.floor(Math.random()*portList.length)];
console.log("The chosen port was " + spoofPort);
const browser = await puppeteerExtra.launch({ headless: true, args: [
'--no-sandbox', '--disable-setuid-sandbox', '--proxy-server=socks5://127.0.0.1:' + spoofPort
]});
const page = await browser.newPage();
const userAgent = 'Mozilla/5.0 (X11; Linux x86_64)' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39 Safari/537.36';
await page.setUserAgent(userAgent);
I'm trying to rotate the IP address for each request (the function that contains this code is essentially called on each request from a client) so that I don't get blocked by the scraped website so fast. I get the below error:
2021-05-17T12:08:19.625349+00:00 app[web.1]: The chosen port was 9050
2021-05-17T12:08:20.042016+00:00 app[web.1]: Error: net::ERR_PROXY_CONNECTION_FAILED at https://expampleDomanPlaceholder.com
2021-05-17T12:08:20.042018+00:00 app[web.1]: at navigate (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
2021-05-17T12:08:20.042018+00:00 app[web.1]: at processTicksAndRejections (internal/process/task_queues.js:93:5)
2021-05-17T12:08:20.042019+00:00 app[web.1]: at async FrameManager.navigateFrame (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
2021-05-17T12:08:20.042020+00:00 app[web.1]: at async Frame.goto (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
2021-05-17T12:08:20.042021+00:00 app[web.1]: at async Page.goto (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:819:16)
2021-05-17T12:08:20.042021+00:00 app[web.1]: at async /app/app.js:174:9
I've tried the solutions detailed in these posts, but maybe the issue is with my userAgent?:
Getting error when attempting to use proxy server in Node.js / Puppeteer
https://github.com/puppeteer/puppeteer/issues/2472
UPDATE: I tried to use this buildpack (https://github.com/iamashks/heroku-buildpack-tor-proxy.git) but it kept causing my web dyno to break (an 'H14' Error was returned, which means you have to clear the build packs and re-add them). Not sure how to proceed from here as that really seemed to be the only solution I was able to come across.