1

I'm trying to execute a construct query over Wikidata using the following code snippet:

construct = "CONSTRUCT { " +
            "   ?s <http://schema.org/about> ?wikipedia ." +
            "} where { " +
            "   OPTIONAL{ " +
            "      ?wikipedia <http://schema.org/about> ?s ; <http://schema.org/inLanguage> ?language ; <http://schema.org/isPartOf> <https://en.wikipedia.org/> . " +
            "   } "+
            "   ?s ?p1 <http://www.wikidata.org/entity/Q12136> . " +
            "}";
            repo = new SPARQLRepository("https://query.wikidata.org/sparql");
            repositoryConnection = repo.getConnection();
            query = repositoryConnection.prepareGraphQuery(construct);
            rs = query.evaluate();
            while (rs.hasNext()) {
                Statement statement = rs.next();
            }

Unfortunately this results in a parse error:

WARN org.eclipse.rdf4j.rio.helpers.ParseErrorLogger - [Rio error] IRI included an unencoded space: '32' (7730, -1)
org.eclipse.rdf4j.query.QueryEvaluationException: org.eclipse.rdf4j.query.QueryEvaluationException: org.eclipse.rdf4j.rio.RDFParseException: IRI included an unencoded space: '32' [line 7730]
    at org.eclipse.rdf4j.query.impl.QueueCursor.convert(QueueCursor.java:58)
    at org.eclipse.rdf4j.query.impl.QueueCursor.convert(QueueCursor.java:22)
    at org.eclipse.rdf4j.common.iteration.QueueIteration.checkException(QueueIteration.java:165)
    at org.eclipse.rdf4j.common.iteration.QueueIteration.getNextElement(QueueIteration.java:134)
    at org.eclipse.rdf4j.common.iteration.LookAheadIteration.lookAhead(LookAheadIteration.java:81)
    at org.eclipse.rdf4j.common.iteration.LookAheadIteration.hasNext(LookAheadIteration.java:49)
    at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:63)
    at eu.qanswer.mapping.mappings.informa.Refactor.main(Refactor.java:227)

As far as I understand in Wikidata there are some uris that are not encoded correctly, i.e. a space is there. So the rdf4j parser complains. Is there a way to configure the parser in a less strict way?

Thank you D063520

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
D063520
  • 113
  • 6
  • honestly, I doubt your query will return anything but lead to a timeout. `OPTIONAL` is a left-join, thus, not commutative. The order matters. The `OPTIONAL` should come after the triple pattern. Compare yours to [this](https://w.wiki/FuH) - I also tried with Jena and it works without a parser error. – UninformedUser Jan 21 '20 at 11:46
  • Why you say it times out? Even the link you pasted does not time out but returns a result ? – D063520 Jan 21 '20 at 12:07
  • Well, yes - but the link I posted has `OPTIONAL` **after** the triple pattern ... you can try the same with your query in the web UI. I bet you'll get a timeout – UninformedUser Jan 21 '20 at 12:11
  • 1
    You were right, the problem is that he stream the first results he gets and then he sends some error message (timeout) which makes the parser break. Thank you! – D063520 Jan 21 '20 at 12:33

1 Answers1

1

As you discovered, the problem here is that your query times out at the server end. The error message you get from RDF4J is confusing, but the cause is that the server endpoint does not correctly communicate that there is a problem: it just creates a 200 HTTP response (so RDF4J thinks everything is OK and starts processing the response body). Halfway through the server suddenly throws an error into the response body, which then makes the RDF4J parser throw this error.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73