Get seed of URL in crawler4j visit()

Question

Hi how do I get the seed where it came from of the page in crawler4j's visit function? So far i have the url of the page but i cant figure out what was the seed that lead to there.

public void visit(Page page) {

    String url = page.getWebURL().getURL();
}

noone knows or is the question stupid? – pinpox Jul 18 '14 at 14:07 — pinpox, Jul 18 '14 at 14:07

score 0 · Answer 1 · answered Aug 16 '14 at 20:09

0

page.getWebURL().getParentUrl();

answered Aug 16 '14 at 20:09

omerfarukdemir

158
1
10

This only gets parent url one generation up. If it's a depth of more than 1, this won't help you to get the original seed. – Ephraim Aug 18 '14 at 09:18
If you store the parent urls of all crawled pages, you can generate parent-child url tree and find original seed. – omerfarukdemir Aug 18 '14 at 14:29

Get seed of URL in crawler4j visit()

1 Answers1