I have a series of simple but exhaustive SPARQL queries. Running them against public SPARQL endpoint of WikiData results in timeouts. Setting up local instance of WikiData would be serious investment not worth this time. So I started with a simple solution:
- I use SPARQL WikiData endpoint to explore data, tune the query and evaluate its results. I use LIMIT 100 to avoid timeouts
- Once I got my query tuned, I translate it manually to a set of series of JSON paths queries, Python filters, etc. to run them over my local dump of WikiData.
- I run them locally. It takes time to process whole dump sequentially, but works.
Second step is error-prone and time-consuming. Is there an automatic solution that can execute SPARQL queries (or rather subset of SPARQL) over a local dump without setting up database?
My SPARQL queries are pretty simple: they extract entities based on their properties and values. I do not build large graphs, I do not use any transitive properties.