7

I am trying to get some information from twitter using CasperJS. And I'm stuck with infinite scroll. The thing is that even using jquery to scroll the page down nothings seems to work. Neither scrolling, neither triggering the exact event on window (smth like uiNearTheBottom) doesn't seem to help. Interesting thing - all of these attempts work when injecting JS code via js console in FF & Chrome. Here's the example code :

casper.thenEvaluate(function(){
    $(window).trigger('uiNearTheBottom');
});

or

casper.thenEvaluate(function(){
    document.body.scrollTop  =  document.body.scrollHeight;
});
finder_sl
  • 307
  • 4
  • 13
  • When CasperJS injects jQuery into the client-side page, it blocks content loaded by Twitter's infinite scrolling. This is a site specific issue. Please see my answer below for a solution. – tim-montague May 05 '14 at 08:36

4 Answers4

4

If casper.scrollToBottom() fails you or casper.scroll_to_bottom(), then the one below will serve you:

this.page.scrollPosition = { top: this.page.scrollPosition["top"] + document.body.scrollHeight, left: 0 };

A working example:

casper.start(url, function () {
 this.wait(10000, function () {
    this.page.scrollPosition = { top: this.page.scrollPosition["top"] + document.body.scrollHeight, left: 0 };
    if (this.visible("div.load-more")) {
        this.echo("I am here");
    }
})});

It uses the underlying PhantomJS scroll found here

iChux
  • 2,266
  • 22
  • 37
  • Are you sure `document.body.scrollHeight` is in Casper context and not inside of a `casper.evaluate`? – Artjom B. Nov 10 '14 at 08:47
  • 1
    @ArtjomB. I have added a working code. In fact, I'm presently using it in a scraping that I am doing. It involves calling the underlying code as found in PhantomJS. – iChux Nov 10 '14 at 09:22
  • 1
    There's now a working copy of twitter scrapping with CasperJS at https://gist.github.com/nwaomachux/35d1c424966fccd16ae1 – iChux Jan 27 '15 at 16:00
2

CasperJs is based on PhantomJS and as per below discussion no window object exist for the headless browser.

You can check the discussion here

geekonweb
  • 384
  • 4
  • 14
  • In at least, `document` exists in page context. And in the first time scroll is working. But tweets not loading. – finder_sl Jul 08 '13 at 08:24
1

On Twitter you can use:

casper.scrollToBottom();
casper.wait(1000, function () {
    casper.capture("loadedContent.png");
});

But if you include jQuery... , the above code won't work!

var casper = require('casper').create({
    clientScripts: [
        'jquery-1.11.0.min.js'
    ]
});

The script injection blocks Twitter's infinite scroll from loading content. On BoingBoing.net, CasperJS scrollToBottom() works with jQuery without blocking. It really depends on the site.

However, you can inject jQuery after the content has loaded.

casper.scrollToBottom();
casper.wait(1000, function () {
    casper.capture("loadedContent.png");

    // Inject client-side jQuery library
    casper.options.clientScripts.push("jquery.js");

    // And use like so...
    var height = casper.evaluate(function () {
        return $(document).height();
    });
});
tim-montague
  • 16,217
  • 5
  • 62
  • 51
0

I have adopted this from a previous answer

var iterations = 5; //amount of pages to go through
var timeToWait = 2000; //time to wait in milliseconds

var last;
var list = [];

for (i = 0; i <= iterations; i++) {
    list.push(i);
}

//evaluate this in the browser context and pass the timer back to casperjs
casper.thenEvaluate(function(iters, waitTime) {
    window.x = 0;
    var intervalID = setInterval(function() {
        console.log("Using setInternal " + window.x);
        window.scrollTo(0, document.body.scrollHeight); 

        if (++window.x === iters) {
            window.clearInterval(intervalID);
        }
    }, waitTime);
}, iterations, timeToWait);

casper.each(list, function(self, i) {

    self.wait(timeToWait, function() {
        last = i;
        this.echo('Using this.wait ' + i);
    });

});

casper.waitFor(function() {
    return (last === list[list.length - 1] && iterations === this.getGlobal('x'));
}, function() {
    this.echo('All done.')
});

Essentially what happens is I enter the page context, scroll to the bottom, and then wait 2 seconds for the content to load. Obviously I would have liked to use repeated applications of casper.scrollToBottom() or something more sophisticated, but the loading time wasn't allowing me to make this happen.

Community
  • 1
  • 1
anguyen
  • 467
  • 4
  • 17