10

I want to inject some HTML into a specific element on a page using puppeteer.

The HTML must be injected before any JavaScript is executed.

There are two ways I think I could do this:

  1. Inject HTML using page.evaluateOnNewDocument

This function is "is invoked after the document was created" but I can't access DOM elements from it. eg:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  page.on('console', consoleObj => console.log(consoleObj.text()));

  await page.evaluateOnNewDocument(
    () => {
      const content = document.querySelector('html');
      console.log(content);
    }
  );

  await page.goto(process.argv[2]);

  await browser.close();
})();

This script just outputs newlines when I visit a page.

  1. Using page.setJavaScriptEnabled to prevent the javascript from executing before I inject the HTML. As per the docs though, this doesn't start executing the javascript after I turn it back on. eg:

My script looks something like this:

const fs = require('fs');
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const html = fs.readFileSync('./example.html', 'utf8');

  await page.setJavaScriptEnabled(false)
  await page.goto(process.argv[2]);
  await page.evaluate(
    content => {
      const pageEl = document.querySelector('div.page');
      let node = document.createElement('div');
      node.innerHTML = content;
      pageEl.appendChild(node);
    }, html
  );
  await page.setJavaScriptEnabled(true)

  await browser.close();
})();

Alternatively, it may also be possible to do something like this, though that seems overly complex for what is a fairly simple request.

Is there an easier way to do this that I am overlooking?

Cheers

seth
  • 255
  • 4
  • 10
  • As for the 1 way: it seems there is no DOM in the time of the script execution. As for the 2 way: it seems `setJavaScriptEnabled()` has no impact on `page.evaluate()`. It is a bit unclear what constraints you have: do you need to insert an element after the DOM is created but before any page script executed? – vsemozhebuty Jan 31 '19 at 16:04
  • Yes, the HTML must be injected into a specific element, so after the DOM is loaded, but before any JavaScript is executed. Re-enabling JavaScript with `setJavaScriptEnabled(true)` doesn't have an impact until the page navigates again – seth Jan 31 '19 at 16:19
  • Maybe you can try to call `page.evaluate()` on `'domcontentloaded'` page event, but success seems unpredictable. – vsemozhebuty Jan 31 '19 at 16:24
  • 2
    Or maybe you can set [`MutationObserver`](https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver) with `evaluateOnNewDocument()` to catch the moment the needed node is added. – vsemozhebuty Jan 31 '19 at 16:28
  • 2
    Thanks for your suggestions. To give you a little more context, I'm trying to inject HTML before a jQuery event listener is added. I attempted to use `domcontentloaded`, but it didn't work. https://pastebin.com/zVNvDXGF This snippet isn't run early enough to intercept the jQuery event listener being added. (meaning the element won't be added early enough) – seth Jan 31 '19 at 17:38
  • Another option: given jQuery script URL and a proper place (or just line 1), you can try to pause the script via [`CDPSession`](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#class-cdpsession) call with ['Debugger.setBreakpointByUrl'](https://chromedevtools.github.io/devtools-protocol/tot/Debugger#method-setBreakpointByUrl), then insert the element, then [remove breakpoint](https://chromedevtools.github.io/devtools-protocol/tot/Debugger#method-removeBreakpoint) and [resume the script](https://chromedevtools.github.io/devtools-protocol/tot/Debugger#method-resume). – vsemozhebuty Jan 31 '19 at 18:01

3 Answers3

3

It appears that this is actually a very popular request and I perhaps should have searched more thoroughly before posting my question.

Nevertheless, I settled on the solution proposed by aslushnikov here.

The following code is just what I produced to test the idea, I'm sure there's significant room for improvement.

I made a simple function to perform XHRs:

const requestPage = async (url) => {
  return new Promise(function (resolve, reject) {
    let xhr = new XMLHttpRequest();
    xhr.open('GET', url);
    xhr.setRequestHeader('Ignore-Intercept', 'Value');
    xhr.onload = function () {
      if (this.status >= 200 && this.status < 300) {
        const response = {};
        xhr.getAllResponseHeaders()
          .trim()
          .split(/[\r\n]+/)
          .map(value => value.split(/: /))
          .forEach(keyValue => {
              response[keyValue[0].trim()] = keyValue[1].trim();
          });
        resolve({ ...response, body: xhr.response });
      } else {
        reject({
            status: this.status,
            statusText: xhr.statusText
        });
      }
    };
    xhr.onerror = function () {
      reject({
          status: this.status,
          statusText: xhr.statusText
      });
    };
    xhr.send();
  });
};

I then exposed this function to the page.

I then used this function to perform an XHR instead of allowing the request to go ahead and used the result of that as the response to the request.

await page.setRequestInterception(true);
page.on('request', async (request) => {
  if (
    request.url() === url
    && (
      typeof request.headers()['access-control-request-headers'] === 'undefined'
      || !request.headers()['access-control-request-headers'].match(/ignore-intercept/gi)
    ) && typeof request.headers()['ignore-intercept'] === 'undefined'
  ) {
    const response = await page.evaluate(`requestPage('${url}')`);
    response.body += "hello";
    request.respond(response);
  } else {
    request.continue();
  }
});

await page.goto(`data:text/html,<iframe style='width:100%; height:100%' src=${url}></iframe>`);

Annoyingly, it didn't seem possible to use page.evaluate unless the desired page was in an iframe. (hence the await page.goto(`data:text/html....

seth
  • 255
  • 4
  • 10
1

With the following snippet I was able to augment the body. I use this for mocking purposes.

const browser = await puppeteer.launch();
browser.on('targetchanged', async target => {
  const targetPage = await target.page();
  const client = await targetPage.target().createCDPSession();
  await client.send('Runtime.evaluate', {
    expression: `
      window.document.addEventListener("DOMContentLoaded", function () {
        const container = window.document.createElement('span');
        container.innerText = "Hello World!";
        window.document.body.appendChild(container);
      });
    `,
  });
});

I'm not entirely sure what targetchanged is. My assumption from fiddling with it is that its when the browser goes to a particular page "target" but I could be wrong.

Other Resources

Nate-Wilkins
  • 5,364
  • 4
  • 46
  • 61
0

You can use Page.evaluateOnNewDocument to run JS in which you can manipulate the DOM.

https://pptr.dev/#?product=Puppeteer&version=v5.2.1&show=api-pageevaluateonnewdocumentpagefunction-args

miroB
  • 468
  • 6
  • 8