I'm trying to run the "Cats" Wikidata query locally against a 2016 Wikidata dump (.ttl format):
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item
WHERE
{
?item wdt:P31 wd:Q146.
}
To do this, I'm running sparql --data wikidata-20160201-all-BETA.ttl --query cats.rq
in the terminal.
I got an R5 3600X CPU and 16GB of RAM and the query just stays running for minutes on end, using 70% of the CPU and roughly 4GB of RAM. The query on Wikidata - which currently has several times more data compared to 2016 - runs in under 2 seconds while still fetching labels using SERVICE
, which I am not.
I'm using Apache Jena to run SPARQL queries and I've been testing mostly on Windows 10. The queries return correct results instantly for small files, such as the ones from Learning SPARQL, so Apache Jena seems to be configured and working fine. I'm however a complete novice in knowledge bases/Wikidata/SPARQL etc., so maybe I'm messing something up.
Edit: I got this error message after ~20 minutes:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
.