4

Any one has a idea on how to invoke a javascript function from puppeteer which is not inline but in an external .js file. If its inline within the html->head->script tag it works but not if the script tag points to an external .js file

Sample HTML File

<html>
    <head>
        <script type="text/javascript">
            function inlineFunction()  {
                window.location.replace('https://www.geeksforgeeks.org');
            }
        </script>
        <script src="script.js" type="text/javascript">
        </script>
    </head>
    <body>
        <p>Hello</p>
        <p>this is an online html</p>
        <p>Link with tag a <a href="https://www.geeksforgeeks.org" name="arivalink">Href Link</a></p>
        <p>Link with inline java script - <a href="#" onClick='inlineFunction();'>Inline JS link</a></p><!-- Works -->
        <p>Link with external JS file w/o tagname - <a href="#" onClick='fileFunction();'>Ext JS Link</a></p><!-- Does not work -->
        <p>Link with external JS file w/ tagname - <a href="#" onClick='fileFunction();' name="geeksLink">Ext JS Link</a></p><!-- Does not work -->
    </body>
</html>

Sample Javascript file

/*----------------------------------------------------*/
/* External Javascript File                           */
/*----------------------------------------------------*/

function fileFunction() {

    window.location.replace('https://www.geeksforgeeks.org');

}

Puppeteer code sample

const puppeteer = require('puppeteer');

async function start() {
    const browser = await puppeteer.launch({
        headless: false
    });

    const page = await browser.newPage();

    //Change the path of "url" to your local path for the html file
    const url = 'file:///Users/sam.gajjar/SG/Projects/headless-chrome/sample.html'; 
    var link = '[name="link"]';

    console.log("Main URL Called");
    await page.goto(url);

    console.log("Link via HTML tag A called");
    await page.click(link);

    await page.waitForTimeout(5000) // Wait 5 seconds
        .then(() => page.goBack());
    
    console.log("Callng inline JS Function");
    await page.evaluate(() => inlineFunction());

    await page.waitForTimeout(5000) // Wait 5 seconds
        .then(() => page.goBack());

    console.log("Callng extjs file Function");
    await page.evaluate(() => fileFunction());

    await page.waitForTimeout(5000) // Wait 5 seconds
        .then(() => page.goBack());

    // console.log("Callng extjs file Function w/tag name");
    // const element = await page.$$('[a href="#"]');

    // await page.waitForTimeout(5000)
        // .then(() => page.goBack());
}

start();
Sam Gajjar
  • 43
  • 1
  • 4
  • 1
    @ggorlen - Have added the puppeteer code where the call to HTML A tag and call in inline JS function works but the same function in an external JS does not work. The puppeteer code will call Link with HTML A then return back and call inline JS function, then return back and then call extJS function where it fails. – Sam Gajjar May 18 '21 at 17:52

1 Answers1

2

First of all, [name="link"] should be [name="arivalink"] to match your DOM. I assume that's a typo.

As another aside, I recommend using the Promise.all navigation pattern instead of waitForTimeout which can cause race conditions (although this doesn't appear to be related to the problem in this case).

As for the main issue, the external file is working just fine, so that's a red herring. You can prove that by calling page.evaluate(() => fileFunction()) right after navigating to sample.html.

The real problem is that when you navigate with window.location.replace('https://www.geeksforgeeks.org');, Chromium isn't pushing that action onto the history stack. It's replacing the current URL, so page.goBack() goes back to the original about:blank rather than sample.html as you expect. about:blank doesn't have fileFunction in it, so Puppeteer throws.

Now, when you click [name="link"] with Puppeteer, that does push the history stack, so goBack works just fine.

You can reproduce this behavior by loading sample.html in a browser and navigating it by hand without Puppeteer.

Long story short, if you're calling a function in browser context using evaluate that runs window.location.replace, you can't rely on page.goBack. You'll need to use page.goto to get back to sample.html.

There's an interesting nuance: if you use page.click to invoke JS that runs location.replace("..."), Puppeteer will push the history stack and page.goBack will behave as expected. If you invoke the same JS logic with page.evaluate(() => location.replace("..."));, Puppeteer won't push the current URL to the history stack and page.goBack won't work as you expect. The evaluate behavior better aligns with "manual" browsing (i.e. as a human with a mouse and keyboard on a GUI).

Here's code to demonstrate all of this. Everything goes in the same directory and node index.js runs Puppeteer (I used Puppeteer 9.0.0).

script.js

const replaceLocation = () => location.replace("https://www.example.com");
const setLocation = () => location = "https://www.example.com";

sample.html

<!DOCTYPE html>
<html lang="en">
<head>
  <title>sample</title>
</head>
<body>
  <div>
    <a href="https://www.example.com">normal link</a> | 
    <a href="#" onclick="replaceLocation()">location.replace()</a> | 
    <a href="#" onclick="setLocation()">location = ...</a>
  </div>
  <script src="script.js"></script>
</body>
</html>

index.js

const puppeteer = require("puppeteer");

const url = "file:///Users/sam.gajjar/SG/Projects/headless-chrome/sample.html";
const log = (() => {
  let logId = 0;
  return (...args) => console.log(logId++, ...args);
})();
let browser;

(async () => {
  browser = await puppeteer.launch({
    headless: false, 
    slowMo: 500,
  });
  const [page] = await browser.pages();
  await page.goto(url);

  // display the starting location
  log(page.url()); // 0 sample.html
  
  // click the normal link and pop the browser stack with goBack
  await Promise.all([
    page.waitForNavigation(),
    page.click("a:nth-child(1)"),
  ]);
  log(page.url()); // 1 example.com
  await page.goBack();
  log(page.url()); // 2 sample.html
  
  // fire location.replace with click
  await Promise.all([
    page.waitForNavigation(),
    page.click("a:nth-child(2)"), // pushes history (!)
  ]);
  log(page.url()); // 3 example.com
  await page.goBack();
  log(page.url()); // 4 sample.html

  // fire location.replace with evaluate
  await Promise.all([
    page.waitForNavigation(),
    page.evaluate(() => replaceLocation()), // doesn't push history
  ]);
  log(page.url()); // 5 example.com
  await page.goBack();
  log(page.url()); // 6 about:blank <--- here's your bug!
  
  await page.goto(url); // go to sample.html from about:blank <-- here's the fix
  log(page.url()); // 7 sample.html
  
  // use location = and see that goBack takes us to sample.html
  await Promise.all([
    page.waitForNavigation(),
    page.evaluate(() => setLocation()), // same behavior as page.click
  ]);
  log(page.url()); // 8 example.com
  await page.goBack();
  log(page.url()); // 9 sample.html
})()
  .catch(err => console.error(err))
  .finally(async () => await browser.close())
;
ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • So this works and thanks a lot for this. However I had kind of simplified the real situation by creating a sample.html and external JS file.. but this does not work for the actual case. Am working with a 20 year old ecommerce site and here is the actual link that I have in the page. Now question.. Does it matter if there are multiple external JS and this link is in nested frameset which is 6 level deep frame? Shopping Basket – Sam Gajjar May 18 '21 at 22:25
  • What doesn't work exactly? If you open a new question, with the complete failing code, I can try to help. It's good to do a minimal example, but if the example has tainted/confusing behavior like `location.replace` borking the `goBack` logic, and that's not part of your original intent, then the example is too far from your original scenario. – ggorlen May 18 '21 at 22:28
  • Again, please open a new question and include all relevant details. All of these details like 6 layers of frames is important information that's fundamentally different than you've shown here. – ggorlen May 18 '21 at 22:30
  • Yea I would agree, but in real scenario I am don't need to use goBack() it was just in the simplfied scenario where all the links were in sample.html and wanted to show the behavior with variation of links using JS – Sam Gajjar May 18 '21 at 22:32
  • That's fine, but it turns out that the `goBack`/`location.replace` stuff inadvertently created a bunch of new (but interesting) problems that are unrelated to the problem you're describing now. I assumed this was your real use case. – ggorlen May 18 '21 at 22:34
  • Apologies but being new to puppeteer I was not sure if it was the external JS not getting called.. v/s the actual scenario... Have posted a new question if it interests you. https://stackoverflow.com/questions/67594856/puppeteer-invoke-external-js-function-from-a-6-level-deep-page-in-frameset – Sam Gajjar May 18 '21 at 22:45
  • Appreciate if you can look at - https://stackoverflow.com/questions/67612278/puppeteer-invoke-external-js-function-from-a-5th-level-deep-child-page-in-fram – Sam Gajjar May 20 '21 at 23:38