17

A little background... I'm a little new to javascript, and to phantom.js, so I don't know if this is a javascript or phantom.js bug (feature?).

The following completes successfully (sorry for the missing phantom.exit(), you'll just have to ctrl+c once you are done):

var page = require('webpage').create();
var comment = "Hello World";

page.viewportSize = { width: 800, height: 600 };
page.open("http://www.google.com", function (status) { 
    if (status !== 'success') {
        console.log('Unable to load the address!');
        phantom.exit();
    } else {
        page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
            console.log("1: ", comment);
        }, comment);

        var foo = page.evaluate(function() {            
            return arguments[0];
        }, comment);

        console.log("2: ", foo);            
    }
});

This works:

page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
    console.log("1: ", comment);
}, comment);

Output: 1: Hello World

But not:

page.includeJs('http://code.jquery.com/jquery-latest.min.js', function(c) {
    console.log("1: ", c);
}, comment);

Output: 1: http://code.jquery.com/jquery-latest.min.js

And not:

page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
    console.log("1: ", arguments[0]);
}, comment);

Output: 1: http://code.jquery.com/jquery-latest.min.js

Looking at the 2nd piece, this works:

var foo = page.evaluate(function() {            
    return arguments[0];
}, comment);

console.log("2: ", foo);

Output: 2: Hello World

And this:

var foo = page.evaluate(function(c) {           
    return c;
}, comment);

console.log("2: ", foo);

Output: 2: Hello World

But not this:

var foo = page.evaluate(function() {            
    return comment;
}, comment);

console.log("2: ", foo);

Output:

ReferenceError: Can't find variable: comment

phantomjs://webpage.evaluate():2

phantomjs://webpage.evaluate():3

phantomjs://webpage.evaluate():3

2: null

The good news is, I know what works and what doesn't, but how about a little consistency?

Why the difference between includeJs and evaluate?

Which is the proper way to pass arguments to an anonymous function?

Anders
  • 15,227
  • 5
  • 32
  • 42

1 Answers1

44

The tricky thing to understand with PhantomJS is that there are two execution contexts - the Phantom context, which is local to your machine and has access to the phantom object and required modules, and the remote context, which exists within the window of the headless browser and only has access to things loaded in webpages you load via page.load.

Most of the script you write is executed in the Phantom context. The main exception is anything within page.evaluate(function() { ... }). The ... here is executed in the remote context, which is sandboxed, without access to the variables and objects in your local context. You can move data between the two contexts by:

  • Returning a value from the function passed to page.evaluate(), or
  • Passing arguments in to that function.

The values thus passed are essentially serialized in each direction - you can't pass a complex object with methods, only a data object like a string or an array (I don't know the exact implementation, but the rule of thumb seems to be that anything you can serialize with JSON can be passed in either direction). You do not have access to variables outside the page.evaluate() function, as you would with standard Javascript, only to variables you explicitly pass in as arguments.

So, your question: Why the difference between includeJs and evaluate?

  • .includeJs(url, callback) takes a callback function that executes within the Phantom context, apparently receiving the url as its first argument. In addition to its arguments, it has access (like any normal JavaScript function) to all variables in its enclosing scope, including comment in your example. It does not take an additional argument list after the callback function - when you reference comment within the callback, you're referencing an outside variable, not a function argument.

    var foo = "stuff";
    page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
        // this callback function executes in the Phantom context
        console.log("jQuery is loaded in the remote context.");
        // it has access to outer-scope variables, including "phantom"
        nowDoMoreStuff(foo, page);
    });
    
  • .evaluate(function, args*) takes a function to execute and zero or more arguments to pass to it (in some serialized form). You need to name the arguments in the function signature, e.g. function(a,b,c), or use the arguments object to access them - they won't automagically have the same names as the variables you pass in.

    var foo = "stuff";
    var bar = "stuff for the remote page";
    
    var result = page.evaluate(function(bar2) {
        // this function executes in the remote context
        // it has access to the DOM, remote libraries, and args you pass in
        $('title').html(bar2);
        // but not to outer-scope vars
        return typeof foo + " " + typeof bar;
    }, bar);
    
    console.log(result); // "undefined undefined"
    

So the correct way to pass arguments in is different for the functions in these different methods. For injectJs, the callback will be called with a new set of arguments (including, at least, the URL), so any variables you want to access need to be in the callback's enclosing scope (i.e. you have access to them within the function's closure). For evaluate, there is only one way to pass in arguments, which is to include them in the arguments passed to evaluate itself (there are other ways, too, but they're tricky and not worth discussing now that this feature is available in PhantomJS itself).

nrabinowitz
  • 55,314
  • 10
  • 149
  • 165
  • Fantastic, thank you! A very clear and descriptive answer, much appreciated. – Anders Sep 01 '12 at 00:30
  • great answer! The dual contexts was not obvious to me at first, and I refactored some code that was in a page.evaluate block out, trying to clean things up and make it reusable, not realizing that I wouldn't be able to pass around DOM elements to functions in my Phantom context. I'm currently reading through your PjScrape code to understand better patterns of organizing my code. Thanks for that! – J_McCaffrey Oct 16 '12 at 17:47
  • @nrabinowitz also I found out that page.includeJs is asynchronous just an FYI for others that are doing few pages in a row. – JackLeo Oct 30 '13 at 06:51
  • Yes, that's why it takes a callback. – nrabinowitz Oct 30 '13 at 17:01
  • Is the [document](http://phantomjs.org/api/webpage/method/include-js.html) wrong or misleading then? It does stuff inside the callback as if it were in the remote context.. – laggingreflex Mar 20 '15 at 02:21
  • This may have changed in later versions of Phantom, but the [example](https://github.com/ariya/phantomjs/blob/master/examples/phantomwebintro.js) still looks like it executes the callback in the Phantom context. So I'm thinking the docs are incorrect, though I haven't tested. – nrabinowitz Mar 25 '15 at 23:12
  • Is there any good way to log something inside evaluate function since it is different context? – geckob Jun 16 '15 at 19:50