Parser to extract URIs from RDF/XML web page for my Web-Crawler in Java

Question

I am building a web Crawler for Linked Data. I have differentiated between HTML and RDF/XML page by following code:

public static int checktype(URL url) throws IOException
{
String contentType = ((HttpURLConnection) url.openConnection()).getContentType();
System.out.println("Website is read"); 
int t=0;
if("text/html".equals(contentType)) {t=0;}
else if("application/rdf+xml".equals(contentType)) {t=1;}
System.out.println(contentType);
return t;

}

Now I want to parse a web page with RDF/XML data to extract all URIs from that page. I am able to find HTML parsers but not for Linked Data. Please help me further

score 2 · Answer 1 · answered Sep 21 '12 at 10:54

2

You're probably better off using an existing library, for example Apache Any23, which already comes with code for automatically distinguishing between different formats and parsers for all the formats.

answered Sep 21 '12 at 10:54

cygri

9,412
1
25
47

score 1 · Answer 2 · answered Sep 21 '12 at 10:45

1

see the jena Library. It contains a RDF/XML parser.

answered Sep 21 '12 at 10:45

Pierre

34,472
31
113
192

Parser to extract URIs from RDF/XML web page for my Web-Crawler in Java

2 Answers2