0

I'm using PhantomJS 2.1.1 on CentOS with the example script rasterize.js just to reproduce the screenshot problem, which is to take a screenshot of a simple web-font demo site:

http://castellsonclaret.com/public/external/georgiapro/demo.htm

It should render like so (as taken using the SaaS PhantomJSCloud service):

Correct rendering

However, with PhantomJS 2.1.1 locally I get

Failed rendering with incorrect colours


First I increased the script timeout to 10s just to be sure that isn't the issue.

Next I thought the css or fonts were blocked somehow from downloading. When I use tcpflow (it's like wireshark) before running phantomjs scripts I can see that the above web page is downloading the .woff fonts. However, they are not being rendered in the screenshot I'm taking.

When I run the following before the phantomjs script

tcpflow -p -c -i eth0 port 80 | grep -oE '(GET|POST|HEAD) .* HTTP/1.[01]'

I can see the fonts are being downloaded. Real console output:

GET /public/external/georgiapro/demo.htm HTTP/1.1

GET /t/1.css?apiType=css&projectid=4a82c0c9-a48a-4ef5-97ae-de0d7e62c8d0 HTTP/1.1

GET /public/external/georgiapro/Fonts/a5d15255-f5b4-4cca-808f-211ec0f25ac8.woff HTTP/1.1

GET /public/external/georgiapro/Fonts/3859825b-bdc4-47f3-af3d-a2ef42d58cfb.woff HTTP/1.1

... [snip] ...

GET /public/external/georgiapro/Fonts/ab79a7ac-4aaf-4393-896b-feb6610c9528.woff HTTP/1.1

I then thought that PhantomJS 2.x still doesn't support woff, but 1) it is supposed to be supported (see here), and 2) the SaaS PhantomJSCloud service can render them fine. Is there something more that is needed to be done to render web fonts?


Update: I've confirmed zlib is installed, and compiled PhantomJS 2.1.1 from source, but the results are still the same as above.


Update: Chrome has headless support, and that is the reason why on April 13 the maintainer of PhrantomJS has announced he is stepping down. Eventually we will switch to headless Chrome. Can headless Chrome handle web fonts?

Drakes
  • 23,254
  • 3
  • 51
  • 94

2 Answers2

1

After a lot a experimenting, tweaking, reverse engineering of PhantomJS source code, plus the fact that it is no longer maintained, I switched over to headless Chrome from version 58 with Node.js drivers. It correctly takes screenshots of sites using WOFF fonts.

Here is my setup for anyone interested.

Installing Node.js and NPM

yum install epel-release
yum install nodejs
node --version # to confirm successful install
yum install npm
# OR, for v8
# curl -sL https://rpm.nodesource.com/setup_8.x | bash -

Installing Node.js modules

npm install chrome-remote-interface --no-bin-links --save
npm install minimist --no-bin-links --save

Installing Chrome on CentOS

cd /tmp
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
yum -y localinstall google-chrome-*
google-chrome --version # to confirm successful install

Node.js screenshot driver script

Save this script as screenshot.js. The source of this script originally came from here. I've modified my version to be more flexible, but to give credit to the author, schnerd, I will reproduce it in its original form:

const CDP = require('chrome-remote-interface');
const argv = require('minimist')(process.argv.slice(2));
const file = require('fs');

// CLI Args
const url = argv.url || 'https://www.google.com';
const format = argv.format === 'jpeg' ? 'jpeg' : 'png';
const viewportWidth = argv.viewportWidth || 1440;
const viewportHeight = argv.viewportHeight || 900;
const delay = argv.delay || 0;
const userAgent = argv.userAgent;
const fullPage = argv.full;

// Start the Chrome Debugging Protocol
CDP(async function(client) {
  // Extract used DevTools domains.
  const {DOM, Emulation, Network, Page, Runtime} = client;

  // Enable events on domains we are interested in.
  await Page.enable();
  await DOM.enable();
  await Network.enable();

  // If user agent override was specified, pass to Network domain
  if (userAgent) {
    await Network.setUserAgentOverride({userAgent});
  }

  // Set up viewport resolution, etc.
  const deviceMetrics = {
    width: viewportWidth,
    height: viewportHeight,
    deviceScaleFactor: 0,
    mobile: false,
    fitWindow: false,
  };
  await Emulation.setDeviceMetricsOverride(deviceMetrics);
  await Emulation.setVisibleSize({width: viewportWidth, height: viewportHeight});

  // Navigate to target page
  await Page.navigate({url});

  // Wait for page load event to take screenshot
  Page.loadEventFired(async () => {
    // If the `full` CLI option was passed, we need to measure the height of
    // the rendered page and use Emulation.setVisibleSize
    if (fullPage) {
      const {root: {nodeId: documentNodeId}} = await DOM.getDocument();
      const {nodeId: bodyNodeId} = await DOM.querySelector({
        selector: 'body',
        nodeId: documentNodeId,
      });
      const {model: {height}} = await DOM.getBoxModel({nodeId: bodyNodeId});

      await Emulation.setVisibleSize({width: viewportWidth, height: height});
      // This forceViewport call ensures that content outside the viewport is
      // rendered, otherwise it shows up as grey. Possibly a bug?
      await Emulation.forceViewport({x: 0, y: 0, scale: 1});
    }

    setTimeout(async function() {
      const screenshot = await Page.captureScreenshot({format});
      const buffer = new Buffer(screenshot.data, 'base64');
      file.writeFile('output.png', buffer, 'base64', function(err) {
        if (err) {
          console.error(err);
        } else {
          console.log('Screenshot saved');
        }
        client.close();
      });
    }, delay);
  });
}).on('error', err => {
  console.error('Cannot connect to browser:', err);
});

Running Chrome as background process

nohup google-chrome --headless --hide-scrollbars --remote-debugging-port=9222 --disable-gpu &

Note: --disable-gpu is currently required, see here

Taking a screenshot

node screenshot.js --url="http://castellsonclaret.com/public/external/georgiapro/demo.htm" --outFile="screenshot.png" --format="jpeg" --viewportWidth=1440 --viewportHeight=900 --delay=1000

Results

WOFF demo:

Screenshot

Browser capabilities test:

Browser cap

Drakes
  • 23,254
  • 3
  • 51
  • 94
0

You should try and use an abstraction so you don't have to deal with the noise that comes from CDP.

I've started a project that just does that (and a few other things as well) https://github.com/joelgriffith/navalia. I'd be more than happy to fix anything you see missing (just throw up an issue!)

browserless
  • 2,090
  • 16
  • 16