0

I'm writing a crawler for Tripadvisor, using crawler4j. I need to collect all the reviews for an item, but the links to the "next" reviews (those with numbers) have associated not a link, but a javascript function. This function is defined somewhere in Tripadvisor's servers. Is there a way to evaluate these functions and get the page which they return?

2 Answers2

0

Have you tried eval? or call if you need to change the caller context.

eval takes a string as an input and tries to execute it.

nemo
  • 1,675
  • 10
  • 16
0

You can use HTMLUnit to get page content. This library can be used to run all javascript codes and then get page code to manipulate.

Here is an example code, taken from a question on stackoverflow.

    HtmlElement element4 = null;
Iterable<HtmlElement> iterable5 = page.getAllHtmlChildElements();
Iterator<HtmlElement> i6 = iterable5.iterator();
while(i6.hasNext() {
    HtmlElement anElement = i6.next();
        if(anElement instanceof HtmlImage) {
        HtmlImage input = (HtmlImage) anElement;
        String[] elements = "http://example.com/pages/powerbutton.png".split( "/" );

        if(input.getSrcAttribute().indexOf(elements[elements.length-1] )> -1 ){
            element4 = input;
            break;
        }
    }
} 
HtmlPage page = element4.click();
Community
  • 1
  • 1
cuneytykaya
  • 579
  • 1
  • 5
  • 14