1

I am using crawler4j to crawl a website. The website has certain parameters at the end of a few url for e.g http://www.abcd.com/xyz/?pqrs

When the shouldVisit() method for such url is called I get the webURL as http://www.abcd.com/xyz/?pqrs but when the visit method on the same url is called I get the URL as http://www.abcd.com/xyz/.

How can I visit page with certain parameters at the end?

working
  • 873
  • 3
  • 11
  • 21

1 Answers1

1

Crawl4j will visit the pages with such parameters by default.

Do you mean you can't get the url with parameters in visit method?

Look at the below code, url has the string http://www.abcd.com/xyz/?pqrs and parentUrl has http://www.abcd.com/xyz/

@Override
public void visit(Page page) {
    String url = page.getWebURL().getURL();
    String parentUrl = page.getWebURL().getParentUrl();
}

Hope my answer may help you.

Kumar
  • 3,782
  • 4
  • 39
  • 87