4

I am developing an application in which I have to check whether a link exists on botw.org for a given URL. Is there any free API available to check botw.org, or any other source to check this?

thanks!

Machavity
  • 30,841
  • 27
  • 92
  • 100
Tokendra Kumar Sahu
  • 3,524
  • 11
  • 28
  • 29

3 Answers3

1

You need a crawler. It's pretty trivial to build one yourself (for small crawls!)

  • hit the front page (see below **)
  • parse the page, extract the links. For this you need an HTML parser that can handle badly formatted HTML. Try Jericho , TagSoup , CyberNecko or HtmlTidy. A normal XML parser probably won't cutr the mustard for most HTML pages, as they are not often well formed XML.
  • check for the link you are looking for. If you can not find it add site local links you have not seen before and go back to step 1, repeat.

For a small site (few thousand pages) you can probably do all this in memory.

** Use the usual java URLConnection or commons HTTPClient (v4) for making the requests.

Note: finding your link - links can exist on a site in absolute, local or resolved to some base href. You'll need to account for this when looking for yours. Easiest is to translate all links to absolute form, taking care to resolve to the current pages base href, if it has one.

Simples.

Joel
  • 29,538
  • 35
  • 110
  • 138
0

You will have to crawl the site, parse each page for anchors, and test if they match what you are looking for... assuming they are not using javascript to create links, few people do.

A standard XML parser will work, SAX is probably the easiest to learn.

david
  • 726
  • 1
  • 5
  • 10
  • 2
    A standard XML parser will very likely NOT work with HTML, given how badly formed it usually is. – Joel Mar 02 '11 at 17:16
0

You can use the search page:

Example: http://search.botw.org/search?q=stackoverflow.com

Instead of crawling the entire site, you can just verify if you get a good result there.

andreialecu
  • 3,639
  • 3
  • 28
  • 36