I want to use IPython Notebook to record SPARQL queries together with the results of those queries.
Since any command-line tool can be called from IPython Notebook with a "bang", I can of course run:
!arq --data dcterms.ttl --query test1.rq
or with roqet, I can even embed a short query in the command itself:
!roqet -i sparql -e ’SELECT * WHERE { ?s ?p ?o }’ -D dcterms.rdf
Neither arq or roqet accept multi-line SPARQL queries as arguments. Any query longer than a one-liner must be stored in a file (e.g., "test1.rq" as above).
Far better would be to define SPARQL queries directly in IPython Notebook cells, where they could easily be cloned and tweaked. The following works:
In [4]: myquery = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT
WHERE {?s rdf:type ?o}
"""
In [5]: def turtleme(myquery):
import rdflib
g = rdflib.Graph()
g.parse('dcam.rdf')
results = g.query(myquery)
print results.serialize(format="turtle")
In [6]: turtleme(myquery)
Out [6]: @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://purl.org/dc/dcam/VocabularyEncodingScheme> a rdfs:Class .
<http://purl.org/dc/dcam/memberOf> a rdf:Property .
However, I do not see a way to pass a SPARQL query that specifies the data sources to be queried, such as:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT
FROM <dcterms.ttl>
FROM <dcam.ttl>
WHERE {?s rdf:type ?o}
or, at a minimum, to improve the function so that it will take at least one filename as an argument, as in
turtleme('dcam.ttl', myquery)
I have scoured Google hits for examples of using IPython Notebook with SPARQL but find none. It seems like an obvious use for an environment designed for data exploration. The only method I have found that really works is to run arq, but then one needs to do
!cat test3.rq
to paste the query into IPython Notebook, which fulfills the function of documenting the process of exploring data, but queries must all be edited, in parallel to the notebook, as separate files. My objective is to make it easy for beginning students to explore RDF data using SPARQL and record their explorations in the notebook. There must be a better way!
UPDATE:
@Joshua Taylor, @AndyS point out that the commands accept multiline queries as arguments. This works fine at the bash prompt but unfortunately not in IPython Notebook, which throws a SyntaxError:
In [5]: !arq --data dcam.ttl '
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcam: <http://purl.org/dc/dcam/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?p ?o WHERE { ?s ?p ?o . }'
Out [5]: File "<ipython-input-5-c9328c1c0c64>", line 2
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
^
SyntaxError: invalid syntax
If I escape the end of line in line 1, as in
In [5]: !arq --data dcam.ttl '\
...
Out [5]: File "<ipython-input-18-313c556abc1d>", line 2
PREFIX dcam: <http://purl.org/dc/dcam/>
^
SyntaxError: invalid syntax
However, I cannot get the entire command to execute by escaping all of the ends of line.
So perhaps the problem lies not with how arq and roqet handle queries in-line but with how those arq and roqet command lines get passed to IPython Notebook?