1

I have the following settings defined for a casperjs

var casper = require('casper').create({
    waitTimeout: 50000,
    stepTimeout: 50000,
    verbose: true,
    viewportSize: {
      width: 1400,
      height: 768
    },
    pageSettings: {
      "userAgent": 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36',
      "loadImages": false,
      "loadPlugins": false,         
      "webSecurityEnabled": false,
      "ignoreSslErrors": true
    },
    onStepTimeout: function() {
      this.echo("Step timed out ");
      var step = casper.getStepNumber();
      casper.gotoStep(step+1);
    }
});

I have added these functions in casperjs modules:

Casper.prototype.getStepNumber = function getStepNumber() {
    "use strict";
    return this.step;
};
Casper.prototype.gotoStep = function gotoStep(stepNum) {
    "use strict";
     var steps = this.steps,
         last = steps.length;
     this.checkStarted();
     this.clear();
     this.step = Math.min(stepNum,last);
     return this;
};

And I have a list of urls in an array 'urlArray'. I am opening all of these urls one by one as follows:

casper.start().each(urlArray, function(self, url) {
    casper.thenOpen(url, function() {
        this.echo("INFO:"+"\t"+url+"\t"+"Opened."+"\n");
    }); 
});

After opening url, I am looking for a particular string in the resources, once I reach there I am just printing that particular url to stdout and aborting the current request as follows

casper.on('resource.requested', function(resource,request) {
    var url = resource.url;
    if(url.indexOf("some string") !== -1) {
        this.echo("url: "+url);
        request.abort();
    }
});

The problem: Casper is going to next page (from the urlArray) before it reaches the resource url that I am looking for and in some cases I am getting 'stepTimeout'. How can I restrict casper to wait til the resource url that I am looking for without getting stepTimeOut (lets say I will have 60 Sec as stepTimeOut) and without skipping the current url.

Current output is:

INFO: url1 Opened.
INFO: url2 Opened.
INFO: url3 Opened.
prints the resource url that I am looking for.
INFO: url4 pened.
INFO: url5 Opened.
INFO: url6 Opened.
INFO: url7 Opened.
INFO: url8 Opened.
prints the resource url that I am looking for.
INFO: url9 Opened.
INFO: url10 Opened.

Note: All the urls that I am crawling contains the resource url that I am searching for.

Som
  • 950
  • 2
  • 16
  • 29
  • Ah, the problem you are getting is a bit clearer to understand now. Does the `onStepTimeout` trigger after your set time of `50000` i.e. 50 seconds? or is it instantaneous? – Pebbl Oct 23 '14 at 07:51

2 Answers2

1

All wait* and then* functions are steps in CasperJS. So stepTimeout is used in all of them whereas waitTimeout is only used for the wait* functions.

stepTimeout:

Type: Number
Default: null
Max step timeout in milliseconds; when set, every defined step function will have to execute before this timeout value has been reached. You can define the onStepTimeout() callback to catch such a case. By default, the script will die() with an error message.

The above documentation tells you everything you need to know. Either you don't set stepTimeout or you overwrite the handler casper.options.onStepTimeout to something that doesn't die().

The reason you're having this problem is probably because the thenOpen step is bound to opening the page. If it doesn't succeed then the timeout is reached in some time and therefore the script dies.

Other considerations:

You say you want to wait until a specific resource if requested, but you don't want to actually load it. It seems that you're not talking about the page, but some resources from the page (js, css, img, ajax calls, etc.). You should change the event handler from page.resource.requested to resource.requested.
While you're at it, change url.indexOf("some string") to url.indexOf("some string") !== -1 otherwise you cannot match the protocol of the url.

If the resource indeed exists on every page that you load, then you can

casper.start().each(urlArray, function(self, url) {
    casper.thenOpen(url, function() {
        this.echo("INFO:"+"\t"+url+"\t"+"Opened."+"\n");
    }).waitForResource(function test(resource){
        return resource.url.indexOf("some string") !== -1;
    }, function then(){
        this.echo("INFO: resource loaded");
    });
});

But then you cannot abort the request in the event handler, because then it will probably not work.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • `thenOpen` is successfully opening the url, I have added the following `casper.thenOpen(url, function() { this.echo("INFO:"+"\t"+url+"\t"+"Opened."+"\n"); });` and checked it by enabling `logLevel : debug`. But still I have the same problem. – Som Oct 23 '14 at 07:32
  • Then I don't understand your problem. If all urls are properly opened and the script doesn't die, how are you getting a stepTimeout? – Artjom B. Oct 23 '14 at 07:35
  • I think the issue is because you are using `request.abort();` -- which probably causes the request to act as a failure. Just a guess however. ArtjomB's answer is correct as far as I see it, +1. – Pebbl Oct 23 '14 at 07:37
  • Edited the code with the `stepTimeOut` that I am using – Som Oct 23 '14 at 07:44
  • @ksreddy Ok, added some more thoughts on what you want to achieve, but I think I'm still in the dark. See it some of it catches your eye. – Artjom B. Oct 23 '14 at 12:00
0

Basically waitTimeout it's the time (milliseconds) that you want to wait to obtain a result of any waitSomething().

So waitTimeout : 10000 will give 10 secs for any of the function wait* (waitForUrl(), waitForSelector() and so on) to respond and return something.

stepTimeout by definition is "Max step timeout in milliseconds; when set, every defined step function will have to execute before this timeout value has been reached. You can define the onStepTimeout() callback to catch such a case. By default, the script will die() with an error message."

That's means you can set a stepTimeout for any step function to force a step to be executed (or aborted) before the waitTimeout.

Here an example: CasperJS skip step on timeout

It's usefull for a chained action.

EDIT: on CasperJS FAQ there's another example: http://docs.casperjs.org/en/1.1-beta2/faq.html#how-does-then-and-the-step-stack-work

Community
  • 1
  • 1
MrPk
  • 2,862
  • 2
  • 20
  • 26