1

in this endpoint there is an option to get the result of a query in N-triples format. I want to do the same with the rdf4j library when connecting to the endpoint and save the result in an ntriples format file.

So far, I've used a graphQuery (CONSTRUCT):

        .....
        String queryString = prefixes +
                " CONSTRUCT { ?sub ?hasProp ?prop } WHERE { ?sub ?hasProp ?prop FILTER(?sub = yago:Naples) } ";
        GraphQuery graphQuery = con.prepareGraphQuery(QueryLanguage.SPARQL, queryString);
        RDFWriter writer = new NTriplesWriter(System.out);
        graphQuery.evaluate(writer);

Unfortunately, I get: [Malformed query result from server] (Expected '.', found '–'). In the endpoint the result is returned just fine (Ntriples format). Could this be a bug of rdf4j?

> <http://yago-knowledge.org/resource/Naples>
> <http://yago-knowledge.org/resource/linksTo>
> <http://yago-knowledge.org/resource/S.S.C._Napoli> .
> <http://yago-knowledge.org/resource/Naples>
> <http://yago-knowledge.org/resource/linksTo>
> <http://yago-knowledge.org/resource/Treno_Alta_Velocit\u00E0> .
> <http://yago-know18:50:57.014 [main] ERROR
> o.e.r.rio.helpers.ParseErrorLogger - [Rio fatal] Expected '.', found
> '–' (386, -1) org.eclipse.rdf4j.query.QueryEvaluationException:
> Malformed query result from server    at
> org.eclipse.rdf4j.repository.sparql.query.SPARQLGraphQuery.evaluate(SPARQLGraphQuery.java:69)
>   at org.example.Connect.main(Connect.java:60) Caused by:
> org.eclipse.rdf4j.repository.RepositoryException: Malformed query
> result from server    at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:934)
>   at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQuery(SPARQLProtocolSession.java:463)
>   at
> org.eclipse.rdf4j.repository.sparql.query.SPARQLGraphQuery.evaluate(SPARQLGraphQuery.java:62)
>   ... 1 more Caused by: org.eclipse.rdf4j.rio.RDFParseException:
> Expected '.', found '–' [line 386]    at
> org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:403)
>   at
> org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:755)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.reportFatalError(TurtleParser.java:1318)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.verifyCharacterOrFail(TurtleParser.java:1153)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:241)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:201)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:143)
>   at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:931)
>   ... 3 more
Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
Manos Ntoulias
  • 513
  • 1
  • 4
  • 21

1 Answers1

4

When RDF4J's SPARQLRepository executes a SPARQL query request against this endpoint, the endpoint sends back its response in Turtle format. Unfortunately that response contains a syntax error. What happens is the following:

  1. RDF4J does a query request, indicating several acceptable result formats (including Turtle and N-Triples);
  2. The endpoint executes the query, picks Turtle as the response format, and serializes the query result in Turtle;
  3. RDF4J receives the Turtle data and parses it;
  4. the parsed result is passed to the NTriplesWriter, which then writes it out.

However, the query result document that the endpoint sends back is not syntactically valid Turtle, which causes RDF4J's Turtle parser to abort with an error, in step 3.

The problem is this line in the response (line 386):

    yago:Italian_War_of_1494–98 ,

Specifically, the character between 1494 and 98. Although it looks like a minus sign (-) which would be perfectly legal, it is in fact a so-called 'en dash', (Unicode character 0x2013). This is not a legal character in a prefixed name in Turtle.

The endpoint's Turtle writer should serialize the value correctly by changing to a full URI instead of a prefixed name, and using a Unicode escape sequence, like so:

<http://yago-knowledge.org/resource/Italian_War_of_1494\u201398>

It might be worth logging a bug report with the endpoint maintainers with a proposed fix to this effect.

As a workaround, the endpoint's N-Triples output (if you force it to respond with N-Triples instead of Turtle) does seem to be syntactically correct. You can force the server to respond back with N-Triples instead of Turtle by "overwriting" the standard Accept header that RDF4J's SPARQLRepository sends, like so:

SPARQLRepository repo = new SPARQLRepository(endpoint);

// create a new map of additional http headers
Map<String, String> headers = new HashMap<String, String>();

// we set the Accept header to _only_ accept text/plain, forcing the endpoint
// to use N-Triples as the response format. This overwrites the standard
// Accept header that RDF4J sends.
headers.put("Accept", "text/plain");
repo.setAdditionalHttpHeaders(headers);

Once you do that, the rest of your code should work.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • 1
    The have been some issues with the Virtuoso serializers reported 2 years ago ([issue 569](https://github.com/openlink/virtuoso-opensource/issues/569) and [issue 567](https://github.com/openlink/virtuoso-opensource/issues/567)). I thought they already fixed it but both tickets are still open - not sure, but at that time prefixed forms have been returned containing illegal characters. I did the same workaround and forced to return a different format for the DBpedia endpoint. Clearly, the bug should never happen and fixed in the triple store instead of the client code. – UninformedUser Jan 20 '19 at 08:16
  • 1
    I can see that the endpoint of the TO is also on version `07.20.3217` similar to the version for which the bug was reported - so it might have been fixed in newer versions and the maintainer of the endpoint should update the triple store – UninformedUser Jan 20 '19 at 08:21
  • If I use Greece instead of Naples in the query i still get the error [Rio error] Unexpected character U+22 at index 64: http://yago-knowledge.org/resource/Athens_International_Airport_"Eleftherios_Venizelos" (58685, -1). Anyway, thanks for the clarification, I used a tuple query instead and parsed/iterated the resultset myself – Manos Ntoulias Jan 20 '19 at 09:32
  • 1
    @user3161227 fwiw that other error is something that you can configure RDF4J's Rio parser to ignore. You'd need to configure the `VERIFY_URI_SYNTAX` parser setting to `false`. See http://docs.rdf4j.org/programming/#_configuring_the_parser for details. – Jeen Broekstra Jan 21 '19 at 04:49