0

I want to use IPython Notebook to record SPARQL queries together with the results of those queries.

Since any command-line tool can be called from IPython Notebook with a "bang", I can of course run:

!arq --data dcterms.ttl --query test1.rq

or with roqet, I can even embed a short query in the command itself:

!roqet -i sparql -e ’SELECT * WHERE { ?s ?p ?o }’ -D dcterms.rdf

Neither arq or roqet accept multi-line SPARQL queries as arguments. Any query longer than a one-liner must be stored in a file (e.g., "test1.rq" as above).

Far better would be to define SPARQL queries directly in IPython Notebook cells, where they could easily be cloned and tweaked. The following works:

In [4]:   myquery = """
          PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          CONSTRUCT
          WHERE {?s rdf:type ?o}
          """

In [5]:   def turtleme(myquery):
              import rdflib
              g = rdflib.Graph()
              g.parse('dcam.rdf')
              results = g.query(myquery)
              print results.serialize(format="turtle")

In [6]:   turtleme(myquery)

Out [6]:  @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
          @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
          @prefix xml: <http://www.w3.org/XML/1998/namespace> .
          @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

          <http://purl.org/dc/dcam/VocabularyEncodingScheme> a rdfs:Class .
          <http://purl.org/dc/dcam/memberOf> a rdf:Property .

However, I do not see a way to pass a SPARQL query that specifies the data sources to be queried, such as:

          PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

          CONSTRUCT
          FROM <dcterms.ttl>
          FROM <dcam.ttl>
          WHERE {?s rdf:type ?o}

or, at a minimum, to improve the function so that it will take at least one filename as an argument, as in

        turtleme('dcam.ttl', myquery)

I have scoured Google hits for examples of using IPython Notebook with SPARQL but find none. It seems like an obvious use for an environment designed for data exploration. The only method I have found that really works is to run arq, but then one needs to do

        !cat test3.rq

to paste the query into IPython Notebook, which fulfills the function of documenting the process of exploring data, but queries must all be edited, in parallel to the notebook, as separate files. My objective is to make it easy for beginning students to explore RDF data using SPARQL and record their explorations in the notebook. There must be a better way!

UPDATE:

@Joshua Taylor, @AndyS point out that the commands accept multiline queries as arguments. This works fine at the bash prompt but unfortunately not in IPython Notebook, which throws a SyntaxError:

In [5]:   !arq --data dcam.ttl '
          PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
          PREFIX dcam:    <http://purl.org/dc/dcam/>
          PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>

          SELECT ?s ?p ?o WHERE { ?s ?p ?o . }'

Out [5]:  File "<ipython-input-5-c9328c1c0c64>", line 2
          PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                   ^
          SyntaxError: invalid syntax

If I escape the end of line in line 1, as in

In [5]:   !arq --data dcam.ttl '\
          ...

Out [5]:  File "<ipython-input-18-313c556abc1d>", line 2
          PREFIX dcam:    <http://purl.org/dc/dcam/>
                    ^
          SyntaxError: invalid syntax

However, I cannot get the entire command to execute by escaping all of the ends of line.

So perhaps the problem lies not with how arq and roqet handle queries in-line but with how those arq and roqet command lines get passed to IPython Notebook?

Tom Baker
  • 683
  • 5
  • 17
  • roqet can accept multiline queries with `-e`. See http://pastebin.com/ZAGy7UhU for an example. Since it accepts multiline queries, and you can specify the data file on the command line, that seems like it would take care of everything? – Joshua Taylor Jun 21 '14 at 13:28
  • When you say you're looking for, at a minimum, a way to be able to `turtleme('dcam.ttl', myquery)`, are you saying that `def turtleme(path,query) … g.parse(path) … g.query(query) …` doesnt' work? – Joshua Taylor Jun 21 '14 at 13:31
  • arq accepts queries on the command like (don't use --file) and multiline queries (it's a quoting issue for the ! line, not the command itself). – AndyS Jun 21 '14 at 13:32
  • @AndyS I thought it did, but didn't see the option for it in the `--help` output, which says that usage is `query --data= --query=`. Ah, but the output from `arq` is `No query string or query file`, and `arq 'select * { ... }'` is as expected. So arq/sparql can accept multiline queries, too. – Joshua Taylor Jun 21 '14 at 13:32
  • @JoshuaTaylor - thank you for pointing out that arq and roqet can take multiline queries as command-line arguments. This works in bash but not in IPython Notebook (see added detail above). So perhaps the problem really has to do with how external shell commands are passed from IPython Notebook to the shell? – Tom Baker Jun 21 '14 at 16:23
  • I'm not familiar with IPython notebook, but http://nbviewer.ipython.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb#Capturing-output has some examples of multiline bash scripts and the like. Is using `%%bash` an option instead of `!`? – Joshua Taylor Jun 21 '14 at 16:29

2 Answers2

1

In IPython Notebook, preceding a shell command with a bang ("!") will work for most commands, (e.g., "!date") but, as noted above, multiline commands are not passed correctly. According to The cell magics in IPython

IPython has a %%script cell magic, which lets you run a cell in a subprocess of any interpreter on your system, such as: bash, ruby, perl, zsh, R, etc.

It can even be a script of your own, which expects input on stdin.

To use it, simply pass a path or shell command to the program you want to run on the %%script line, and the rest of the cell will be run by that script, and stdout/err from the subprocess are captured and displayed.

So to pass the query correctly, the IPython Notebook cell must begin with %%script bash (or just %%bash), as in:

In [5]:  %%script bash
         arq --data dcam.ttl '
         PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
         PREFIX dcam:    <http://purl.org/dc/dcam/>
         PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>

         CONSTRUCT
         WHERE { ?s rdf:type ?o . }'
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Tom Baker
  • 683
  • 5
  • 17
0

Both sparql/arq and roqet can accept multiline queries; you just need to quote them appropriately. Here's an example:

$ cat data.n3
@prefix : <http://stackoverflow.com/q/24337235/1281433/> .

:sparql :accepts :multiLineQueries.
:roqet :accepts :multiLineQueries.
$ roqet -D data.n3 -e '
select ?s ?p ?o where {
  ?s ?p ?o
}'
roqet: Running query '
select ?s ?p ?o where {
  ?s ?p ?o
}'
roqet: Query has a variable bindings result
result: [s=uri<http://stackoverflow.com/q/24337235/1281433/sparql>, p=uri<http://stackoverflow.com/q/24337235/1281433/accepts>, o=uri<http://stackoverflow.com/q/24337235/1281433/multiLineQueries>]
result: [s=uri<http://stackoverflow.com/q/24337235/1281433/roqet>, p=uri<http://stackoverflow.com/q/24337235/1281433/accepts>, o=uri<http://stackoverflow.com/q/24337235/1281433/multiLineQueries>]
roqet: Query returned 2 results
$ sparql --data data.n3 '
select ?s ?p ?o where {
  ?s ?p ?o
}'
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| s                                                    | p                                                     | o                                                              |
=================================================================================================================================================================================
| <http://stackoverflow.com/q/24337235/1281433/roqet>  | <http://stackoverflow.com/q/24337235/1281433/accepts> | <http://stackoverflow.com/q/24337235/1281433/multiLineQueries> |
| <http://stackoverflow.com/q/24337235/1281433/sparql> | <http://stackoverflow.com/q/24337235/1281433/accepts> | <http://stackoverflow.com/q/24337235/1281433/multiLineQueries> |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • It is great to know that arq can accept multiline SPARQL queries as arguments on the bash command line, but unfortunately IPython Notebook seems not to pass those multiline commands to the shell intact. (See modified original post above for details.) – Tom Baker Jun 21 '14 at 16:26