4

I'm new to Nightmare/PhantomJS and am struggling to get a simple inventory of all the tags on a given page. I'm running on Ubuntu 14.04 after building PhantomJS from source and installing NodeJS, Nightmare and so forth manually, and other functions seem to be working as I expect.

Here's the code I'm using:

var Nightmare = require('nightmare');
new Nightmare()
  .goto("http://www.google.com")
  .wait()
  .evaluate(function () 
   {
     var a = document.getElementsByTagName("*");
     return(a);
   }, 
   function(i) 
   {
     for (var index = 0; index < i.length; index++)
     if (i[index])
        console.log("Element " + index + ": " + i[index].nodeName);
    })
  .run(function(err, nightmare) 
  {
     if (err) 
        console.log(err);
  }); 

When I run this inside a "real" browser, I get a list of all the tag types on the page (HTML, HEAD, BODY, ...). When I run this using node GetTags.js, I just get a single line of output:

Element 0: HTML

I'm sure it's a newbie problem, but what am I doing wrong here?

Valerie R
  • 1,769
  • 9
  • 29

1 Answers1

3

PhantomJS has two contexts. The page context which provides access to the DOM can only be accessed through evaluate(). So, variables must be explicitly passed in and out of the page context. But there is a limitation (docs):

Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

Closures, functions, DOM nodes, etc. will not work!

Nightmare's evaluate() function is only a wrapper around the PhantomJS function of the same name. This means that you will need to work with the elements in the page context and only pass a representation to the outside. For example:

.evaluate(function () 
{
    var a = document.getElementsByTagName("div");
    return a.length;
}, 
function(i) 
{
    console.log(i + " divs available");
})
Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • Thanks for the quick reply! – Valerie R Jun 16 '15 at 20:30
  • Thank you very much...really clears things up for me. I changed my code to essentially build a JSON string of the information I want in the part you describe as the "page context", and then return it to the script. I suppose the part that threw me was that I was getting a single line of output rather than an error of some sort...anyway, thanks again! – Valerie R Jun 16 '15 at 20:46
  • Is this still working ? For some reason that doesn't work for me.. Unable to console.log() like that. – RetroCode Aug 31 '16 at 08:29
  • 1
    @RetroCode I haven't tried, but I've looked into the code for v2 and it still looks like it should work this way. Remember that Nightmare v2 was basically a full rewrite with promises and Electron instead of PhantomJS. – Artjom B. Aug 31 '16 at 18:07
  • @ArtjomB. Does nightmare v2 still has the restriction "The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine"? – Just a learner Jan 17 '17 at 17:47
  • @OgrishMan Sorry, I haven't tried it and I'm not familiar with how Electron handles this. This was essentially a restriction of PhantomJS and not of Nightmare. – Artjom B. Jan 17 '17 at 21:23