I am trying to Crawl the DBpedia with Apache Nutch 1.15, but i'm having problems with parsing RDF files.
On the parsing phase, i only get this message:
**apache_nutch | Error parsing: http://dbpedia.org/data/Moscow.xml: failed(2,0): Can't retrieve Tika parser for mime-type application/rdf+xml **
following this reference, i configured my parse-plugins.xml to parse application/rdf+xml as this:
<mimeType name="application/rdf+xml">
<plugin id="parse-tika" />
<plugin id="feed" />
</mimeType>
But still, the message persists.
Even when i use Any23, mapping the parse filter as
<alias name="any23-parserFilter"
extension-id="Any23Parser" />
and setting the parsers for the mime type as:
<mimeType name="application/rdf+xml">
<plugin id="parse-tika" />
<plugin id="feed" />
</mimeType>
The message still persists.
What i'm missing here?