UPDATE 3/14/23: After a lot of reading, as well as trial and error, I was able to work through some of the pitfalls specifically related to puppeteer + heroku. Ultimately, I found the following updates to my project to be beneficial to moving this along:
- Require executablePath from puppeteer
const { executablePath } = require('puppeteer')
and apply the executablePath() function to the executablePath launch param. - Add specific puppeteer configuration, this helped work through some chromium errors. Instructions on that can be found here. Do not run npm install before pushing to heroku; heroku will do this for you when it runs your build command (assuming that includes an install command) and the size of the .cache folder may trigger a "remote rejected" error from heroku.
- Add heroku buildpack for puppeteer. This buildpack can be found here.
- Add the following param to puppeteer launch: args:
['--no-sandbox', '--disable-setuid-sandbox']
. This helps address the "No usable sandbox!" error and is also required to make the buildpack work.
After making those updates, I was able to get to a point where I'm no longer experiencing puppeteer specific errors and am in fact seeing some console logs in my heroku logs that imply the puppeteer code in server.js is running. At this point, I still have an issue in that the screenshots are still not showing in the designated folder and I still get a 503 at the end for some reason. I have a hunch that this is a different issue from all the heroku + puppeteer challenges (although I won't be surprised if it has at least some relation), so will spend some time with it and will likely post a different, more focused question for that when I feel I have a better handle on the problem, or at least have spent enough time banging my head against it. Will leave this question here in case others find any of the above helpful.
The app: This app is meant to be an internal tool that allows someone to leverage puppeteer.js to take automated screenshots without having to use the command line to get it done (ultimately this is a tool meant for designers). The basic idea is they can input the url to the web page that has the images to capture, along with a file path for where they want the images to live locally on their system. This works perfectly in my local environment and didn't start having problems until after deployment.
The issue: On initial deployment, the app itself loads just fine exactly as expected and with no console errors. Upon trying to use it, however, it throws a 503 error on the get request that sends both the url for the images to screenshot and the file path for where to store them to the server and I'm at a little bit of a loss as to why this is an issue in its deployed state but not locally. The 503 error itself seems to indicate it's taking the entire compiled url, including the parameters I'm attempting to pass with it, as the actual api endpoint (as opposed to just '/dcsgrab' being the end point) and that is different from the local behavior and doesn't seem right to me, but I'm not super confident on that. There is also an issue about puppeteer-core and specifying an "executablePath" or "channel"; I'm currently researching this but not finding a lot of info right off the bat (I assumed line 28 in server.js addresses that, but it seems there might still be a problem).
Console error:
GET https://dcsgrab.herokuapp.com/dcsgrab?tearsheetUrl=https://www.google.com/doubleclick/preview/dynamic/previewsheet/CMP6kgUQ3cxBGLbTkhUgicQs&imagefilelocation=/Users/tyler.anyan/Downloads/dcsgrab-test-folder/images/ 503 (Service Unavailable) App.js:27
Heroku log: For the sake of not making this too crowded and lengthy, I'm only including the single error I found in the overall list from the heroku logs command. Please let me know if the rest would be helpful and I will update it.
2023-03-09T18:26:29.726820+00:00 app[web.1]: /app/node_modules/puppeteer-core/lib/cjs/puppeteer/util/assert.js:28
2023-03-09T18:26:29.726832+00:00 app[web.1]: throw new Error(message);
2023-03-09T18:26:29.726833+00:00 app[web.1]: ^
2023-03-09T18:26:29.726834+00:00 app[web.1]:
2023-03-09T18:26:29.726834+00:00 app[web.1]: Error: An `executablePath` or `channel` must be specified for `puppeteer-core`
2023-03-09T18:26:29.726835+00:00 app[web.1]: at assert (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/util/assert.js:28:15)
2023-03-09T18:26:29.726836+00:00 app[web.1]: at ChromeLauncher.launch (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/ChromeLauncher.js:92:36)
2023-03-09T18:26:29.726837+00:00 app[web.1]: at async /app/server/server.js:26:19
2023-03-09T18:26:29.730410+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=GET path="/dcsgrab?tearsheetUrl=https://www.google.com/doubleclick/preview/dynamic/previewsheet/CMP6kgUQ3cxBGLbTkhUgicQs&imagefilelocation=/Users/my.name/Downloads/dcsgrab-test-folder/images/" host=dcsgrab.herokuapp.com request_id=9bc2093a-401d-464f-87fd-4b72dc4998a0 fwd="24.16.69.155" dyno=web.1 connect=0ms service=10ms status=503 bytes=0 protocol=https
Relevant code: These are the bits that seem to be relevant based on what's happening in the errors, but again, let me know if it would be helpful to include more and I will do my best to update things.
Front end (react) App.js:
import React, { useState, useRef } from 'react';
import './App.css';
import DataInput from './Components/data-input';
import Footer from './Components/footer';
import Header from './Components/header';
function App() {
const [data, setData] = useState(null);
const [screenShotData, setScreenshotData] = useState(null);
const [imageFileLocationData, setImageFileLocationData] = useState(null);
const [statusMessage, showStatusMessage] = useState(false);
const waitingAnimationRef = useRef(null);
const getScreenshotData = (screenShotData, imageFileLocationData) => {
setScreenshotData(screenShotData);
setImageFileLocationData(imageFileLocationData);
showStatusMessage(true);
setData('');
fetch(`/dcsgrab?tearsheetUrl=${screenShotData}&imagefilelocation=${imageFileLocationData}`)
.then((response) => response.json())
.then((data) => setData(data.message));
};
return (
<div className="App">
<Header />
<DataInput getScreenshotData={getScreenshotData} />
{
!statusMessage ? '' : <p>{!data ? 'Taking screenshots...' : data}</p>
}
<Footer />
</div>
);
}
export default App;
data-input.js:
import React from 'react';
import { useState } from 'react';
function DataInputForm({ getScreenshotData }) {
const [tearsheetUrl, setTearsheetUrl] = useState('');
const [imageFileLocation, setimageFileLocation] = useState('');
return (
<div>
<form>
<input
id='tearsheetUrl'
name='tearsheetUrl'
placeholder='input tearsheet url'
type='text'
value={tearsheetUrl}
onChange={(event) => setTearsheetUrl(event.target.value)}
/>
{<input
id='imageFileLocation'
name='imageFileLocation'
placeholder='input image directory location'
type='text'
value={imageFileLocation}
onChange={(event) => setimageFileLocation(event.target.value)}
/>}
</form>
<button id="submit-data" onClick={() => getScreenshotData(tearsheetUrl, imageFileLocation)}>SUBMIT</button>
</div>
);
}
export default DataInputForm;
Back end server.js:
const express = require('express');
const path = require('path');
const PORT = process.env.PORT || 3001;
const puppeteer = require('puppeteer-core');
const os = require('os');
// TODO/NICE TO HAVE: Figure out chrome paths for linux
const CHROME_PATHS = {
darwin: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
linux: undefined,
win32: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
};
const CHROME_PATH = CHROME_PATHS[os.platform()];
const PREVIEW_SELECTOR = '.dynamic-ad-card-back iframe';
const NEXT_SELECTOR = '.md-icon-button[aria-label="Next"]';
const PIXEL_DENSITY = 2;
let DELAY_FOR_ANIMATION = 15000;
const app = express();
app.use(express.static(path.resolve(__dirname, '../dcsgrab/build')));
app.get('/dcsgrab', (request, response) => {
(async () => {
const browser = await puppeteer.launch({
headless: true,
executablePath: CHROME_PATH,
});
let screenshotCounter = 1;
const page = await browser.newPage();
page.setViewport({width: 1280, height: 6000, deviceScaleFactor: PIXEL_DENSITY});
await page.goto(request.query.tearsheetUrl, { waitUntil: 'networkidle0' });
/**
* Checks if the pagination button is active
* @return {Promise.<Boolean>} Promise which resolves with a true boolean if the button is active
*/
async function isNextButtonActive() {
return await page.evaluate((selector) => {
return !document.querySelector(selector).disabled;
}, NEXT_SELECTOR);
}
/**
* Clicks the pagination button
* @return {Promise} Promise which resolves when the element matching selector is successfully clicked. The Promise will be rejected if there is no element matching selector
*/
async function clickNextButton() {
return await page.click(NEXT_SELECTOR, {delay: 100});
}
/**
* Waits for the loading spinner widget to go away, indicating the iframes have been added to the page
* @return {Promise.undefined}
*/
async function waitForLoadingWidget() {
return await page.waitForSelector('.preview-loading-widget', {hidden: true}).then(() => {
console.log('Loading widget is gone');
})
.catch(e => {
console.log(e.message);
});
}
/**
* Gets the name of the tear sheet
* @return {Promise<string>} The name
*/
async function getSheetName() {
return await page.evaluate((selector) => {
return document.querySelector(selector).textContent.replace(/[*."/\\[\]:;|=,]/g, '-');
}, '.preview-sheet-header-text span');
}
/**
* Screenshot the creative elements on the current page
* @return {Promise.<Array>} Promise which resolves with an array of clipping paths
*/
async function getScreenShots() {
const rects = await page.$$eval(PREVIEW_SELECTOR, iframes => {
return Array.from(iframes, (el) => {
const {x, y, width, height} = el.getBoundingClientRect();
return {
left: x,
top: y,
width,
height,
id: el.id,
};
});
}, PREVIEW_SELECTOR).catch(e => {
console.error(e.message);
});
return Promise.all(rects.map(async (rect) => {
return await page.screenshot({
path: `${request.query.imagefilelocation}${await getSheetName()}-screenshot-${screenshotCounter++}.png`,
clip: {
x: rect.left,
y: rect.top,
width: rect.width,
height: rect.height,
},
}).then(() => {
console.log(`${rect.id} element captured.`);
})
.catch((e) => {
console.error(e.message);
});
}));
}
// Wait a bit then take screenshots
await new Promise(resolve => setTimeout(resolve, DELAY_FOR_ANIMATION));
await getScreenShots().catch((e) => console.error(e.message));
// Continue taking screenshots till there are no pages left
while (await isNextButtonActive()) {
await clickNextButton();
await waitForLoadingWidget();
await new Promise(resolve => setTimeout(resolve, DELAY_FOR_ANIMATION)),
await getScreenShots().catch((e) => console.error(e.message));
}
await browser.close();
response.json({ message: 'Screenshots are done!\nPlease check the root directory you previously designated.' });
})();
});
app.get('*', (request, response) => {
response.sendFile(path.resolve(__dirname, '../dcsgrab/build', 'index.html'));
});
app.listen(PORT, () => {
console.log(`Server is listening on port ${PORT}`);
});