2

I need to traverse forms on a site and save intermediate results to files. I'm using phantomjs' page.evaluate, but I'm having trouble accessing the filesystem from within page.evaluate's sandboxed environment. I have something like this:

for (var i = 0; i<option1.length; i++){
    for (var ii = 0; ii<option2.length; ii++){
        for (var iii = 0; iii<option3.length; iii++){
        ...
            //I found what I want to save
            fs.write("someFileName", someData);
        }
    }
}

Obviously, I don't have access to nodejs' fs from within page.evaluate, so the above does not work. I seem to have a few options:

  • Store everything I need to write to an array, and return that from the page.evaluate context into the outer, nodejs context, then save it from there. This would require memory I don't have.
  • Break up the above logic into smaller page.evaluate methods that return singe pieces of data to save to the filesytem.
  • Somehow pass into the page.evaluate a magic function to write to the filesystem. This seems to not be possible (if I try to pass in a function that calls fs.writeFile for example, I get that fs is undefined, even if fs is a free variable in the function I passed?)
  • Return an iterator which, when pulled, yields the next piece of data to be written
  • Setup a trivial web server on the localhost that simply accepts POST requests and writes their contents into the filesystem. The page.evaluate code would then make those requests to the localhost. I almost try this but I'm not sure I'll be affected by the same-origin policy.

What are my options here?

ealfonso
  • 6,622
  • 5
  • 39
  • 67

1 Answers1

2

Your evaluation is sound, but you forgot one type: onCallback. You can register to the event handler in the phantom context and push your data from page context to a file through this callback:

page.onCallback = function(data) {
    if (!data.file) {
        data.file = "defaultFilename.txt";
    }
    if (!data.mode) {
        data.mode = "w";
    }
    fs.write(data.file, data.str, data.mode);
};

...
page.evaluate(function(){
    for (var i = 0; i<option1.length; i++){
        for (var ii = 0; ii<option2.length; ii++){
            for (var iii = 0; iii<option3.length; iii++){
            ...
                // save data
                if (typeof window.callPhantom === 'function') {
                    window.callPhantom({ file: "someFileName", str: someData, mode: "a" }); // append
                }
            }
        }
    }
});

Note that PhantomJS does not run in Node.js. Although, there are bridges between Node.js and PhantomJS. See also my answer here.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • I mean, what type of writing stuff to the disk from page context you select is primarily opinion-based. I can just suggest one that might be a good fit for your use case, but might not be good for the next person's use case. Every person should decide for themselves, how they want to do it. I can't say *use this every time*. – Artjom B. Dec 15 '14 at 16:41
  • yeah. I think this bridge between Node.js and PhantomJS is mainly what people will get out of your answer. – ealfonso Dec 15 '14 at 16:43
  • Really? I haven't written about any bridges. The code is vanilla PhantomJS. Code for PhantomJS bridge for Node.js will be slightly different. – Artjom B. Dec 15 '14 at 16:46
  • Sorry, it was my confusion. I see that PhantomJS doesn't run in Node.js. What I meant was the "bridge" between the `page.evaluate` context and the outer phantomjs context. – ealfonso Dec 15 '14 at 17:03
  • Can I register multiple `onCallback` functions? Or only one? – ealfonso Dec 16 '14 at 21:30
  • Only one, but you should be able to distinguish multiple actions based on the object properties that are passed. – Artjom B. Dec 16 '14 at 21:32