Well this question may sound stupid, but I did research like hours to find solution but I couldn't so if anyone knows, that would be GREAT!!!
I successfully read arc file (from commoncrawl dataset). With arcHeader.getUrl();
I'm getting all URLs. However I don't understand, if 'outgoing' links from that particular URL is there, if its there how to get those?
[PS] By 'outgoing', I mean, in whole page, which URL it contains as say ad, content etc. Does that commoncrawl arc file contains, if yes how to get those?
Thanks in advance!
EDIT: I solved this, read HTML content and got all ! wasnt that difficult!