2

I have bulk loaded the wikidata dump using tdbloader2. And now I am trying to make SPARQL queries. A query like this runs very slowly(can not be finished in more than 24 hours), though it works on https://query.wikidata.org/ :

    PREFIX      rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX       wdt:  <http://www.wikidata.org/prop/direct/>
    PREFIX        wd:  <http://www.wikidata.org/entity/>
    SELECT ?item ?property ?itemLabel
    WHERE
    {
    wd:Q5487302 ?property ?item.
    ?item rdfs:label ?itemLabel.
    FILTER(LANG(?itemLabel) = "" || LANG(?itemLabel) = "en").
    }

However, this runs pretty fast(less than 5 seconds):

    PREFIX      rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX       wdt:  <http://www.wikidata.org/prop/direct/>
    PREFIX        wd:  <http://www.wikidata.org/entity/>
    SELECT ?item ?property ?itemLabel
    WHERE
    {
    wd:Q5487302 ?property ?item.
    }

And this runs fast, too:

    PREFIX      rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX       wdt:  <http://www.wikidata.org/prop/direct/>
    PREFIX        wd:  <http://www.wikidata.org/entity/>
    SELECT ?item ?property ?itemLabel
    WHERE
    {
    ?item rdfs:label ?itemLabel.
    FILTER(LANG(?itemLabel) = "" || LANG(?itemLabel) = "en").
    } LIMIT 1000

So I don't know what is wrong with the first query.

  • 1
    there is nothing wrong with the query. first of all, I'm sure the triple pattern `?item rdfs:label ?itemLabel.` leads to a very large intermediate result. But I guess the main issue might be something with the query optimizer. Did you check this [document](https://jena.apache.org/documentation/tdb/optimizer.html)? Otherwise, I'd also recommend to ask on the Jena users mailing list. The devs like Andy, Rob and Adam usually answer pretty fast and indeed can help. Indeed Andy is also here, so maybe he'll reply later. – UninformedUser Dec 11 '18 at 09:01
  • 1
    In the meantime, you could run the `tdbquery --explain` tool to use what happens under the hood. And maybe add it here to the question. – UninformedUser Dec 11 '18 at 09:02
  • It's interesting whether enclosing `wd:Q5487302 ?property ?item` into curly braces would help... – Stanislav Kralin Dec 11 '18 at 09:13
  • Which version of the Jena coooooooois this? The only odd thing I can see in the first query is that it is `?property` and I guess you are using the default BGP optimizations. They may favour `?s rdfs:label` over `:uri ?p`. Try putting a file "none.opt" in the TDB directory, restart and see what happens. Stanislav's suggestion of `{}` is also good : try around the triple+filter`{ wd:Q5487302 ?property ?item. { ?item rdfs:label ?itemLabel. FILTER(LANG(?itemLabel) = "" || LANG(?itemLabel) = "en"). } }` – AndyS Dec 11 '18 at 10:09
  • Thanks. Both enclosing and putting "none.opt" in the TDB directory work. I think my problem is because of my way to use the optimizer. – Romulus Libertas Dec 11 '18 at 12:19
  • Version? Because I tried with the current codebase and nothing odd happened. – AndyS Dec 11 '18 at 13:00
  • The version is 3.9.0. – Romulus Libertas Dec 12 '18 at 08:06

0 Answers0