App works locally but has issues after Heroku deployment

Question

UPDATE 3/14/23: After a lot of reading, as well as trial and error, I was able to work through some of the pitfalls specifically related to puppeteer + heroku. Ultimately, I found the following updates to my project to be beneficial to moving this along:

Require executablePath from puppeteer const { executablePath } = require('puppeteer') and apply the executablePath() function to the executablePath launch param.
Add specific puppeteer configuration, this helped work through some chromium errors. Instructions on that can be found here. Do not run npm install before pushing to heroku; heroku will do this for you when it runs your build command (assuming that includes an install command) and the size of the .cache folder may trigger a "remote rejected" error from heroku.
Add heroku buildpack for puppeteer. This buildpack can be found here.
Add the following param to puppeteer launch: args: ['--no-sandbox', '--disable-setuid-sandbox']. This helps address the "No usable sandbox!" error and is also required to make the buildpack work.

After making those updates, I was able to get to a point where I'm no longer experiencing puppeteer specific errors and am in fact seeing some console logs in my heroku logs that imply the puppeteer code in server.js is running. At this point, I still have an issue in that the screenshots are still not showing in the designated folder and I still get a 503 at the end for some reason. I have a hunch that this is a different issue from all the heroku + puppeteer challenges (although I won't be surprised if it has at least some relation), so will spend some time with it and will likely post a different, more focused question for that when I feel I have a better handle on the problem, or at least have spent enough time banging my head against it. Will leave this question here in case others find any of the above helpful.

The app: This app is meant to be an internal tool that allows someone to leverage puppeteer.js to take automated screenshots without having to use the command line to get it done (ultimately this is a tool meant for designers). The basic idea is they can input the url to the web page that has the images to capture, along with a file path for where they want the images to live locally on their system. This works perfectly in my local environment and didn't start having problems until after deployment.

The issue: On initial deployment, the app itself loads just fine exactly as expected and with no console errors. Upon trying to use it, however, it throws a 503 error on the get request that sends both the url for the images to screenshot and the file path for where to store them to the server and I'm at a little bit of a loss as to why this is an issue in its deployed state but not locally. The 503 error itself seems to indicate it's taking the entire compiled url, including the parameters I'm attempting to pass with it, as the actual api endpoint (as opposed to just '/dcsgrab' being the end point) and that is different from the local behavior and doesn't seem right to me, but I'm not super confident on that. There is also an issue about puppeteer-core and specifying an "executablePath" or "channel"; I'm currently researching this but not finding a lot of info right off the bat (I assumed line 28 in server.js addresses that, but it seems there might still be a problem).

Console error:

GET https://dcsgrab.herokuapp.com/dcsgrab?tearsheetUrl=https://www.google.com/doubleclick/preview/dynamic/previewsheet/CMP6kgUQ3cxBGLbTkhUgicQs&imagefilelocation=/Users/tyler.anyan/Downloads/dcsgrab-test-folder/images/ 503 (Service Unavailable) App.js:27

Heroku log: For the sake of not making this too crowded and lengthy, I'm only including the single error I found in the overall list from the heroku logs command. Please let me know if the rest would be helpful and I will update it.

2023-03-09T18:26:29.726820+00:00 app[web.1]: /app/node_modules/puppeteer-core/lib/cjs/puppeteer/util/assert.js:28
2023-03-09T18:26:29.726832+00:00 app[web.1]: throw new Error(message);
2023-03-09T18:26:29.726833+00:00 app[web.1]: ^
2023-03-09T18:26:29.726834+00:00 app[web.1]: 
2023-03-09T18:26:29.726834+00:00 app[web.1]: Error: An `executablePath` or `channel` must be specified for `puppeteer-core`
2023-03-09T18:26:29.726835+00:00 app[web.1]: at assert (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/util/assert.js:28:15)
2023-03-09T18:26:29.726836+00:00 app[web.1]: at ChromeLauncher.launch (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/ChromeLauncher.js:92:36)
2023-03-09T18:26:29.726837+00:00 app[web.1]: at async /app/server/server.js:26:19
2023-03-09T18:26:29.730410+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=GET path="/dcsgrab?tearsheetUrl=https://www.google.com/doubleclick/preview/dynamic/previewsheet/CMP6kgUQ3cxBGLbTkhUgicQs&imagefilelocation=/Users/my.name/Downloads/dcsgrab-test-folder/images/" host=dcsgrab.herokuapp.com request_id=9bc2093a-401d-464f-87fd-4b72dc4998a0 fwd="24.16.69.155" dyno=web.1 connect=0ms service=10ms status=503 bytes=0 protocol=https

Relevant code: These are the bits that seem to be relevant based on what's happening in the errors, but again, let me know if it would be helpful to include more and I will do my best to update things.

Front end (react) App.js:

import React, { useState, useRef } from 'react';
import './App.css';
import DataInput from './Components/data-input';
import Footer from './Components/footer';
import Header from './Components/header';

function App() {
  const [data, setData] = useState(null);
  const [screenShotData, setScreenshotData] = useState(null);
  const [imageFileLocationData, setImageFileLocationData] = useState(null);
  const [statusMessage, showStatusMessage] = useState(false);

  const waitingAnimationRef = useRef(null);

  const getScreenshotData = (screenShotData, imageFileLocationData) => {
    setScreenshotData(screenShotData);
    setImageFileLocationData(imageFileLocationData);
    showStatusMessage(true);
    setData('');

    fetch(`/dcsgrab?tearsheetUrl=${screenShotData}&imagefilelocation=${imageFileLocationData}`)
      .then((response) => response.json())
      .then((data) => setData(data.message));
  };

  return (
    <div className="App">
      <Header />
      <DataInput getScreenshotData={getScreenshotData} />
      {
        !statusMessage ? '' : <p>{!data ? 'Taking screenshots...' : data}</p>
      }
      <Footer />
    </div>
  );
}

export default App;

data-input.js:

import React from 'react';
import { useState } from 'react';

function DataInputForm({ getScreenshotData }) {
    const [tearsheetUrl, setTearsheetUrl] = useState('');
    const [imageFileLocation, setimageFileLocation] = useState('');

    return (
        <div>
            <form>
                <input 
                    id='tearsheetUrl'
                    name='tearsheetUrl'
                    placeholder='input tearsheet url'
                    type='text'
                    value={tearsheetUrl}
                    onChange={(event) => setTearsheetUrl(event.target.value)}
                />
                {<input 
                    id='imageFileLocation'
                    name='imageFileLocation'
                    placeholder='input image directory location'
                    type='text'
                    value={imageFileLocation}
                    onChange={(event) => setimageFileLocation(event.target.value)}
                />}
            </form>
            <button id="submit-data" onClick={() => getScreenshotData(tearsheetUrl, imageFileLocation)}>SUBMIT</button>
        </div>
    );
}

export default DataInputForm;

Back end server.js:

const express = require('express');
const path = require('path');
const PORT = process.env.PORT || 3001;
const puppeteer = require('puppeteer-core');
const os = require('os');

// TODO/NICE TO HAVE: Figure out chrome paths for linux
const CHROME_PATHS = {
  darwin: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
  linux: undefined,
  win32: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
};
const CHROME_PATH = CHROME_PATHS[os.platform()];

const PREVIEW_SELECTOR = '.dynamic-ad-card-back iframe';
const NEXT_SELECTOR = '.md-icon-button[aria-label="Next"]';
const PIXEL_DENSITY = 2;
let DELAY_FOR_ANIMATION = 15000;

const app = express();

app.use(express.static(path.resolve(__dirname, '../dcsgrab/build')));

app.get('/dcsgrab', (request, response) => {
    (async () => {
        const browser = await puppeteer.launch({
            headless: true,
            executablePath: CHROME_PATH,
        });

        let screenshotCounter = 1;

        const page = await browser.newPage();

        page.setViewport({width: 1280, height: 6000, deviceScaleFactor: PIXEL_DENSITY});

        await page.goto(request.query.tearsheetUrl, { waitUntil: 'networkidle0' });

        /**
       * Checks if the pagination button is active
       * @return {Promise.<Boolean>} Promise which resolves with a true boolean if the button is active
       */
      async function isNextButtonActive() {
        return await page.evaluate((selector) => {
          return !document.querySelector(selector).disabled;
        }, NEXT_SELECTOR);
      }

      /**
       * Clicks the pagination button
       * @return {Promise} Promise which resolves when the element matching selector is successfully clicked. The Promise will be rejected if there is no element matching selector
       */
      async function clickNextButton() {
        return await page.click(NEXT_SELECTOR, {delay: 100});
      }

      /**
       * Waits for the loading spinner widget to go away, indicating the iframes have been added to the page
       * @return {Promise.undefined}
       */
      async function waitForLoadingWidget() {
        return await page.waitForSelector('.preview-loading-widget', {hidden: true}).then(() => {
          console.log('Loading widget is gone');
        })
          .catch(e => {
            console.log(e.message);
          });
      }

      /**
       * Gets the name of the tear sheet
       * @return {Promise<string>} The name
       */
      async function getSheetName() {
        return await page.evaluate((selector) => {
          return document.querySelector(selector).textContent.replace(/[*."/\\[\]:;|=,]/g, '-');
        }, '.preview-sheet-header-text span');
      }

      /**
       * Screenshot the creative elements on the current page
       * @return {Promise.<Array>} Promise which resolves with an array of clipping paths
       */
        async function getScreenShots() {
            const rects = await page.$$eval(PREVIEW_SELECTOR, iframes => {
              return Array.from(iframes, (el) => {
                const {x, y, width, height} = el.getBoundingClientRect();

                return {
                  left: x,
                  top: y,
                  width,
                  height,
                  id: el.id,
                };
              });
            }, PREVIEW_SELECTOR).catch(e => {
              console.error(e.message);
            });

            return Promise.all(rects.map(async (rect) => {
              return await page.screenshot({
                path: `${request.query.imagefilelocation}${await getSheetName()}-screenshot-${screenshotCounter++}.png`,
                clip: {
                  x: rect.left,
                  y: rect.top,
                  width: rect.width,
                  height: rect.height,
                },
              }).then(() => {
                console.log(`${rect.id} element captured.`);
              })
                .catch((e) => {
                  console.error(e.message);
                });
            }));
        }

        // Wait a bit then take screenshots
      await new Promise(resolve => setTimeout(resolve, DELAY_FOR_ANIMATION));
      await getScreenShots().catch((e) => console.error(e.message));

        // Continue taking screenshots till there are no pages left
      while (await isNextButtonActive()) {
        await clickNextButton();
        await waitForLoadingWidget();
        await new Promise(resolve => setTimeout(resolve, DELAY_FOR_ANIMATION)),
        await getScreenShots().catch((e) => console.error(e.message));
      }

        await browser.close();

        response.json({ message: 'Screenshots are done!\nPlease check the root directory you previously designated.' });
    })();
});

app.get('*', (request, response) => {
    response.sendFile(path.resolve(__dirname, '../dcsgrab/build', 'index.html'));
});

app.listen(PORT, () => {
    console.log(`Server is listening on port ${PORT}`);
});

What is the platform you've deployed to on Heroku? They are normally Linux servers - which means your `executablePath` is `undefined` - hence throws the error you see. — Randy Casburn, Mar 09 '23 at 19:36
Sorry, I'm not sure I understand what you're asking, I don't recall seeing any part of the heroku deployment process asking me to specify a platform. This app was built with node, react, on mac OS, deployed to heroku using the heroku CLI. — tganyan, Mar 09 '23 at 20:34
I think I understand my confusion on this. I was under the impression that the executablePath had to do with the OS of the user, not the OS of the server running the application. Also, I'm in the middle of reading the docs myself, so your snarky "Thank you for the privilege of reading the documentation for you" isn't necessary and will be flagged. The fact that I wanted clarity on your vague wording of the question doesn't make me clueless about the tools in my toolbox, your entire comment is uncalled for. — tganyan, Mar 10 '23 at 17:12
Fair enough - but you labeled the offending code "_Back end server.js_" - so an objective viewer would surmise you know that is actually running on "_the OS of the server_". With all that said, the `executablePath` is optional and is an automatically computed. I'm not sure you even need to stress over that particular option. Try removing that property from your configuration and see if that fixes things for you. [The configuration docs](https://pptr.dev/api/puppeteer.configuration#properties) and the [`executablePath` docs](https://pptr.dev/api/puppeteer.configuration.executablepath). — Randy Casburn, Mar 10 '23 at 17:41
It's not the core of the issue to why this isn't running properly; it's likely related, and needs to be cleaned up, so that's what I'm doing now, but I don't think it will resolve the issue as there's something having to do with the get request, possibly an issue with the browser trying to access the local file system. I don't think removing it is an option as it throws other errors as a result; I think [this question](https://stackoverflow.com/questions/74251875/puppeteer-error-an-executablepath-or-channel-must-be-specified-for-puppete) relates to that. — tganyan, Mar 10 '23 at 17:50
At this point I'm trying to find some clear info on what chrome path I should specify for linux and not yet finding it, but will keep searching. — tganyan, Mar 10 '23 at 17:51
There's lots of reasons why puppeteer won't work on Heroku, but the most common is the site is blocking datacenter traffic. — pguardiario, Mar 11 '23 at 01:18

App works locally but has issues after Heroku deployment

0 Answers0