0

Hi how do I get the seed where it came from of the page in crawler4j's visit function? So far i have the url of the page but i cant figure out what was the seed that lead to there.

public void visit(Page page) {

    String url = page.getWebURL().getURL();
}
pinpox
  • 179
  • 2
  • 10

1 Answers1

0
page.getWebURL().getParentUrl();
omerfarukdemir
  • 158
  • 1
  • 10
  • This only gets parent url one generation up. If it's a depth of more than 1, this won't help you to get the original seed. – Ephraim Aug 18 '14 at 09:18
  • If you store the parent urls of all crawled pages, you can generate parent-child url tree and find original seed. – omerfarukdemir Aug 18 '14 at 14:29