Crawler4j and Tripadvisor

Question

I'm writing a crawler for Tripadvisor, using crawler4j. I need to collect all the reviews for an item, but the links to the "next" reviews (those with numbers) have associated not a link, but a javascript function. This function is defined somewhere in Tripadvisor's servers. Is there a way to evaluate these functions and get the page which they return?

score 0 · Answer 1 · answered Jun 27 '12 at 10:51

0

Have you tried eval? or call if you need to change the caller context.

eval takes a string as an input and tries to execute it.

answered Jun 27 '12 at 10:51

nemo

1,675
10
16

score 0 · Answer 2 · edited May 23 '17 at 12:12

You can use HTMLUnit to get page content. This library can be used to run all javascript codes and then get page code to manipulate.

Here is an example code, taken from a question on stackoverflow.

    HtmlElement element4 = null;
Iterable<HtmlElement> iterable5 = page.getAllHtmlChildElements();
Iterator<HtmlElement> i6 = iterable5.iterator();
while(i6.hasNext() {
    HtmlElement anElement = i6.next();
        if(anElement instanceof HtmlImage) {
        HtmlImage input = (HtmlImage) anElement;
        String[] elements = "http://example.com/pages/powerbutton.png".split( "/" );

        if(input.getSrcAttribute().indexOf(elements[elements.length-1] )> -1 ){
            element4 = input;
            break;
        }
    }
} 
HtmlPage page = element4.click();

Crawler4j and Tripadvisor

2 Answers2