2

I am building a web Crawler for Linked Data. I have differentiated between HTML and RDF/XML page by following code:

public static int checktype(URL url) throws IOException
{
String contentType = ((HttpURLConnection) url.openConnection()).getContentType();
System.out.println("Website is read"); 
int t=0;
if("text/html".equals(contentType)) {t=0;}
else if("application/rdf+xml".equals(contentType)) {t=1;}
System.out.println(contentType);
return t;

}

Now I want to parse a web page with RDF/XML data to extract all URIs from that page. I am able to find HTML parsers but not for Linked Data. Please help me further

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
Prannoy Mittal
  • 1,525
  • 5
  • 21
  • 32

2 Answers2

2

You're probably better off using an existing library, for example Apache Any23, which already comes with code for automatically distinguishing between different formats and parsers for all the formats.

cygri
  • 9,412
  • 1
  • 25
  • 47
1

see the jena Library. It contains a RDF/XML parser.

Pierre
  • 34,472
  • 31
  • 113
  • 192