3

I have a RPi 4 and I want, via terminal, to generate a website.html that has the complete rendered html of a webpage. I want to do this for example in order to search the whole page for a string or pattern etc... I can do this using something like wget or curl for example wget -O website.html https://www.example.com The above is all I want, however it doesn't support javascript.

Some websites (like Google) have almost everything in javascript, so I cannot get the final html by that way.

  • I have been searching all day for a working solution, and I have found that I need something like a headless browser. I have tried things like PhantomJs but they don't work and are not longer maintained.
  • I have tried Puppeteer but I was only able to grab a screenshot. Not the Html. I thought that page.content() had what I wanted but I couldn't get it/write it to a file. When I console.loged it I saw javascript there as well... If someone knows how to do that (write a file with the final html) using Puppeteer then please tell me.

Isn't there any 'easy' solution like wget that does javascript as well? Isn't there a simple workflow/instructions in order to achieve something like this?

If you could tell me some working commands to do this please tell me. I find some tools very complicated and I am not familiar with all programming languages in order to make this work.

Any help would be greatly appreciated.

1 Answers1

4

If you get Node.js and Puppeteer installed, you can use this simple script to get the HTML with JavaScript executed. Use it as:

node script.js url pagename

For test purposes, the default url is 'http://example.com/' and the default pagename is 'page-timestamp.html' in the current directory.

const fs = require('fs');
const puppeteer = require('puppeteer');

const url = process.argv[2] || 'http://example.com/';
const path = process.argv[3] || `page-${Date.now()}.html`;

(async function main() {
  const browser = await puppeteer.launch();
  const [page] = await browser.pages();

  await page.goto(url, { waitUntil: 'networkidle0' });
  fs.writeFileSync(path, await page.content());

  await browser.close();
})().catch(console.error);
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • 1
    Thank you so much for your help. It works very good. If anyone is trying this and got "Error: Failed to launch the browser process!" try to replace this line `const browser = await puppeteer.launch();` with `const browser = await puppeteer.launch({ executablePath: '/usr/bin/chromium-browser' });` It worked for me!!! – aris melachroinos Dec 30 '20 at 09:12