0

TL;DR: Is there a way I can download and install the data and software used for Wikidata's SPARQL endpoint query.wikidata.org locally? The reason I need this is that I have queries to run which often run into timeouts.

In another SO question, I have read that the software is Blazegraph.

Long version:

I'm using the SPARQL query service https://query.wikidata.org to run quite heavy queries against it. For example, the following query retrieves a list of all chemical compounds (instances of Q11173 or its subclasses):

SELECT ?item ?boiling_point ?melting_point ?decomp_point ?mass ?smiles
  (GROUP_CONCAT(DISTINCT ?chemFormula; SEPARATOR=", ") AS ?chemFormulae)
  (GROUP_CONCAT(DISTINCT ?chemStructure; SEPARATOR=", ") AS ?chemStructures)
  WHERE {
    ?item wd:P31/wdt:P279*|wdt:P279* wd:Q11173.

    OPTIONAL { ?item wdt:P2102 ?boiling_point. }
    OPTIONAL { ?item wdt:P2101 ?melting_point. }
    OPTIONAL { ?item wdt:P2107 ?decomp_point. }
    OPTIONAL { ?item wdt:P2067 ?mass. }
    OPTIONAL { ?item wdt:P274 ?chemFormula. }
    OPTIONAL { ?item wdt:P117 ?chemStructure. }
    OPTIONAL { ?item wdt:P233 ?smiles. }
  }
  GROUP BY ?item ?boiling_point ?melting_point ?decomp_point ?mass ?smiles

Here's a direct link.

Since there are over a million instances, this query is hitting the timeout of one minute, and I don't see a possibility to optimize the query, because even without the properties, and with a LIMIT of 10 entries, the query runs into the timeout:

SELECT ?item
  WHERE {
    ?item wd:P31/wdt:P279*|wdt:P279* wd:Q11173.
  }

Direct link.

I could query the subclasses individually by writing divide-and-conquer scripts, but before I do that, I wanted to check for a simpler possibility:

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Jonas Sourlier
  • 13,684
  • 16
  • 77
  • 148
  • didn't you ask before how to load Wikidata dump and how long it takes? Download Blazegraph, download the wikidata dump, and load it. Takes a few days on a powerful server – UninformedUser Dec 09 '19 at 05:54
  • Yes, sorry for asking again. The original question was deleted because someone serial-downvoted all my posts, and the anti-serial-voting bot corrected only half of those downvotes. – Jonas Sourlier Dec 09 '19 at 08:26
  • 1
    ok, I see. But the answers should be the same :D you can setup the Blazegraph and load Wikidata dump. But as said last time, it takes ~4 days on a good server iirc. – UninformedUser Dec 09 '19 at 08:32

0 Answers0