TL;DR: Is there a way I can download and install the data and software used for Wikidata's SPARQL endpoint query.wikidata.org locally? The reason I need this is that I have queries to run which often run into timeouts.
In another SO question, I have read that the software is Blazegraph.
Long version:
I'm using the SPARQL query service https://query.wikidata.org to run quite heavy queries against it. For example, the following query retrieves a list of all chemical compounds (instances of Q11173 or its subclasses):
SELECT ?item ?boiling_point ?melting_point ?decomp_point ?mass ?smiles
(GROUP_CONCAT(DISTINCT ?chemFormula; SEPARATOR=", ") AS ?chemFormulae)
(GROUP_CONCAT(DISTINCT ?chemStructure; SEPARATOR=", ") AS ?chemStructures)
WHERE {
?item wd:P31/wdt:P279*|wdt:P279* wd:Q11173.
OPTIONAL { ?item wdt:P2102 ?boiling_point. }
OPTIONAL { ?item wdt:P2101 ?melting_point. }
OPTIONAL { ?item wdt:P2107 ?decomp_point. }
OPTIONAL { ?item wdt:P2067 ?mass. }
OPTIONAL { ?item wdt:P274 ?chemFormula. }
OPTIONAL { ?item wdt:P117 ?chemStructure. }
OPTIONAL { ?item wdt:P233 ?smiles. }
}
GROUP BY ?item ?boiling_point ?melting_point ?decomp_point ?mass ?smiles
Since there are over a million instances, this query is hitting the timeout of one minute, and I don't see a possibility to optimize the query, because even without the properties, and with a LIMIT
of 10 entries, the query runs into the timeout:
SELECT ?item
WHERE {
?item wd:P31/wdt:P279*|wdt:P279* wd:Q11173.
}
I could query the subclasses individually by writing divide-and-conquer scripts, but before I do that, I wanted to check for a simpler possibility: