19

I am using Selenium WebDriver for crawling a web site(only for example, I will be crawling other web sites too!) which has infinite scroll.

Problem statement:

Scroll down the infinite scroll page till the content stops loading using Selenium web driver.

My Approach: Currently I am doing this-

Step 1: Scroll to the page bottom

JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("javascript:window.onload=toBottom();"+
                        "function toBottom(){" +
                        "window.scrollTo(0,Math.max(document.documentElement.scrollHeight," +
                        "document.body.scrollHeight,document.documentElement.clientHeight));" +
                "}");

Then I wait for some time to let the Ajax Request complete like this-

Step 2: Explicitly wait for Ajax request to be over

Thread.sleep(1000);

Then I give another java script to check if the page is scrollable

Step 3:Check if the page is scrollable

//Alternative to document.height is to be used which is document.body.clientHeight
//refer to https://developer.mozilla.org/en-US/docs/DOM/document.height

    if((Long)js.executeScript("return " +
                                "(document.body.clientHeight-(window.pageYOffset + window.innerHeight))")>0)

If the above condition is true then I repeat the from Step 1 - 3, till condition in Step 3 is false.

The Problem: I do not want to give the Thread.sleep(1000); in step 2, rather I would like to check using Java Script if the background Ajax request is over and then scroll down further if the condition in Step 3 is true .

PS: I am not the developer of the page so I do not have access to the code running the page, I can just inject java scripts(as in Step 1 and 3) in the web page. And, I have to write a generic logic for any web site with Ajax requests during infinite scroll.

I will be grateful to some one could spare some time here!

EDIT : Ok, after struggling for 2 days, I have figured out that the pages which I am crawling through the Selenium WebDriver can have any of these JavaScript libraries and I will have to pool according to the different Library, for example, In case of the web application using jQuery api, I may be waiting for

(Long)((JavascriptExecutor)driver).executeScript("return jQuery.active")

to return a zero.

Likewise if the web application is using the Prototype JavaScript library I will have to wait for

(Long)((JavascriptExecutor)driver).executeScript("return Ajax.activeRequestCount")

to return a zero.

Now, the problem is how do I write a generic code which could handle most the JavaScript libraries available?

Problem I am facing in implementing this-

1. How do I find which JavaScript Library is being used in the Web Application(using Selenium WebDriver in Java), such that I can then write the corresponding wait methods? Currently, I am using this

Code

2. This way I will have to write as many as 77 methods for separate JavaScript library so, I need a better way to handle this scenario as well.

In short, I need to figure out if the browser is making any call(Ajax or simple) with or without any JavaScript library through Selenium Web Driver's java implementation

PS: there are Add ons for Chorme's JavaScript Lib detector and Firefox's JavaScript Library detector which detect the JavaScript library being used.

Ishank
  • 2,860
  • 32
  • 43

3 Answers3

11

For web pages with Ajax Response during the infinite scroll and using jQuery API(or other actions), before starting to opening the web page.

    //Inject the pooling status variable
    js.executeScript("window.status = 'fail';");

    //Attach the Ajax call back method
    js.executeScript( "$(document).ajaxComplete(function() {" +
    "status = 'success';});");

Step 1: will remain the same as in the original question

Step 2 Pooling the following script(This is the one which removes the need of Thread.Sleep() and makes the logic more dynamic)

String aStatus = (String)js.executeScript("return status;");

                        if(aStatus!=null && aStatus.equalsIgnoreCase("success")){
                            js.executeScript("status = 'fail';");
                            break poolingLoop;
                        }

Step 3: No need now!

Conclusion: No need to give blunt Thread.sleep(); again and again while using Selenium WebDriver!!

This approach works good only if there's jQuery api being used in the web application.

EDIT: As per the the link given by @jayati i injected the javascript-

Javascript one:

//XMLHttpRequest instrumentation/wrapping
var startTracing = function (onnew) {
    var OldXHR = window.XMLHttpRequest;

    // create a wrapper object that has the same interfaces as a regular XMLHttpRequest object
    // see http://www.xulplanet.com/references/objref/XMLHttpRequest.html for reference on XHR object
    var NewXHR = function() {
        var self = this;
        var actualXHR = new OldXHR();

        // private callbacks (for UI):
        // onopen, onsend, onsetrequestheader, onupdate, ...
        this.requestHeaders = "";
        this.requestBody = "";

        // emulate methods from regular XMLHttpRequest object
        this.open = function(a, b, c, d, e) { 
            self.openMethod = a.toUpperCase();
            self.openURL = b;
            ajaxRequestStarted = 'open';

            if (self.onopen != null && typeof(self.onopen) == "function") { 
                self.onopen(a,b,c,d,e); } 
            return actualXHR.open(a,b,c,d,e); 
        }
        this.send = function(a) {
            ajaxRequestStarted = 'send';

            if (self.onsend != null && typeof(this.onsend) == "function") { 
                self.onsend(a); } 
            self.requestBody += a;
            return actualXHR.send(a); 
        }
        this.setRequestHeader = function(a, b) {
            if (self.onsetrequestheader != null && typeof(self.onsetrequestheader) == "function") { self.onsetrequestheader(a, b); } 
            self.requestHeaders += a + ":" + b + "\r\n";
            return actualXHR.setRequestHeader(a, b); 
        }
        this.getRequestHeader = function() {
            return actualXHR.getRequestHeader(); 
        }
        this.getResponseHeader = function(a) { return actualXHR.getResponseHeader(a); }
        this.getAllResponseHeaders = function() { return actualXHR.getAllResponseHeaders(); }
        this.abort = function() { return actualXHR.abort(); }
        this.addEventListener = function(a, b, c) { return actualXHR.addEventListener(a, b, c); }
        this.dispatchEvent = function(e) { return actualXHR.dispatchEvent(e); }
        this.openRequest = function(a, b, c, d, e) { return actualXHR.openRequest(a, b, c, d, e); }
        this.overrideMimeType = function(e) { return actualXHR.overrideMimeType(e); }
        this.removeEventListener = function(a, b, c) { return actualXHR.removeEventListener(a, b, c); }

        // copy the values from actualXHR back onto self
        function copyState() {
            // copy properties back from the actual XHR to the wrapper
            try {
                self.readyState = actualXHR.readyState;
            } catch (e) {}
            try {
                self.status = actualXHR.status;
            } catch (e) {}
            try {
                self.responseText = actualXHR.responseText;
            } catch (e) {}
            try {
                self.statusText = actualXHR.statusText;
            } catch (e) {}
            try {
                self.responseXML = actualXHR.responseXML;
            } catch (e) {}
        }

        // emulate callbacks from regular XMLHttpRequest object
        actualXHR.onreadystatechange = function() {
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            // onreadystatechange callback            
            if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { return self.onreadystatechange(); } 
        }
        actualXHR.onerror = function(e) {

            ajaxRequestComplete = 'err';
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onerror != null && typeof(self.onerror) == "function") { 
                return self.onerror(e); 
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }
        actualXHR.onload = function(e) {

            ajaxRequestComplete = 'loaded';
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onload != null && typeof(self.onload) == "function") { 
                return self.onload(e); 
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }
        actualXHR.onprogress = function(e) {
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onprogress != null && typeof(self.onprogress) == "function") { 
                return self.onprogress(e);
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }

        if (onnew && typeof(onnew) == "function") { onnew(this); }
    }

    window.XMLHttpRequest = NewXHR;

}
window.ajaxRequestComplete = 'no';//Make as a global javascript variable
window.ajaxRequestStarted = 'no';
startTracing();

Or Javascript Two:

var startTracing = function (onnew) {
    window.ajaxRequestComplete = 'no';//Make as a global javascript variable
    window.ajaxRequestStarted = 'no';

    XMLHttpRequest.prototype.uniqueID = function() {
        if (!this.uniqueIDMemo) {
            this.uniqueIDMemo = Math.floor(Math.random() * 1000);
        }
        return this.uniqueIDMemo;
    }

    XMLHttpRequest.prototype.oldOpen = XMLHttpRequest.prototype.open;

    var newOpen = function(method, url, async, user, password) {

        ajaxRequestStarted = 'open';
        /*alert(ajaxRequestStarted);*/
        this.oldOpen(method, url, async, user, password);
    }

    XMLHttpRequest.prototype.open = newOpen;

    XMLHttpRequest.prototype.oldSend = XMLHttpRequest.prototype.send;

    var newSend = function(a) {
        var xhr = this;

        var onload = function() {
            ajaxRequestComplete = 'loaded';
            /*alert(ajaxRequestComplete);*/
        };

        var onerror = function( ) {
            ajaxRequestComplete = 'Err';
            /*alert(ajaxRequestComplete);*/
        };

        xhr.addEventListener("load", onload, false);
        xhr.addEventListener("error", onerror, false);

        xhr.oldSend(a);
    }

    XMLHttpRequest.prototype.send = newSend;
}
startTracing();

And checking the status of the status vars ajaxRequestStarted, ajaxRequestComplete in the java code, one can determine if the ajax was started or completed.

Now I have a way to wait till an Ajax is complete, I can also find if the Ajax was triggered on some action

Ishank
  • 2,860
  • 32
  • 43
7

Approach 1:

Your approach is good, just a few changes would do the trick:

Step 1: Improve this step to call the toBottom function at regular interval using window.setInterval. At (c >= totalcount) call window.clearInterval

Setp 2: Instead of checking the page is yet scrollable, check if (c >= totalcount). And this condition every 200ms until (c >= totalcount) returns true.

FYI: If the Step 1 doesn't work in all the browsers then probably, you can refer to line 5210 of Tata-Nano-Reviews-925076578.js and call this with cvariable checking.

Approach 2:

Go to jQuery API and type "ajax". You can find some callback handlers which could be used for ajax requests.

Probably, set a variable before the request is been sent and after it is been received appropriately.

And in between use your original method of scrolling to bottom at regular interval, unless you can no more scroll. At this point clear the interval variable.

Now, regularly check if that interval variable is null or not. Null would mean that you have reached the bottom.

iMatoria
  • 1,450
  • 2
  • 19
  • 35
  • Thanks for the response, iMatoria! the requirement is to write a generic code which would work with any web site with Ajax response during infinite scroll, so I can not have the hard coded conditions like waiting for 200ms, check totalcount. So, the best approach seems to be checking the Ajax request status. – Ishank Aug 01 '12 at 09:31
  • Yes, it seems that I have to use jQuery Api for this..but, What to do if they are not using jQuery on their website and are I am unable to use it? (I am constrained by requirements and have no access to the sources) – Ishank Aug 01 '12 at 09:56
  • You could check the jQuery variable and add google cdn for jQuery. – iMatoria Aug 01 '12 at 11:05
  • hey!! the jQuery call back method work only when the jQuery api is already loaded. They do not work if I inject the jQuery api by my self. – Ishank Aug 09 '12 at 07:47
  • this seems to provide a way - http://stackoverflow.com/questions/797960/extending-an-activexobject-in-javascript/3202098#3202098 i.e. irrespective of the library used. – Ishank Aug 20 '12 at 09:16
0

We had to solve the same problem, and managed using a long Javascript function. Just need to add checks to see which library is not undefined.

PS Thanks for giving me an easy answer for how to check for in progress Prototype requests!

eg. Handle JQuery and XHR/Prototype

var jsExecutor = /*Get your WebDriverInstance*/ as IJavaScriptExecutor;
while(/*your required timeout here*/)
{
    var ajaxComplete = 
      jsExecutor.ExecuteScript("return ((typeof Ajax === 'undefined') ||   
      Ajax.activeRequestCount == 0) && ((typeof jQuery === 'undefined') || $.active == 0)");
    if (ajaxIsComplete)
      return
}
lvanzyl
  • 143
  • 1
  • 8