im new to crawler4j. I crawled a website to a certain depth and found what i searched for. What i am trying to do now is to trace back my steps and find out how i got on this page. I need a list of the links that led me to the page where the content i was looking for is onto.
My try was to change the visit method in the crawler
@Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
// condition for content found
boolean contentFound = false;
// compute 'content found' here
if (contentFound) {
System.out.println(page.getWebURL().getParentUrl());
getMyController().shutdown();
}
}
This only gives me a String of the parent url.
page.getWebURL().getParentDocid();
only gets me the document id of the parent, but how can i find out the parent of this page?
Thanks!