8

I need to test for a certain structural property of a couple million SPARQL queries, and for that I need the structure of the WHERE statement. I'm currently trying to use fyzz to do this, but unfortunately its documentation is not very useful. Parsing queries is easy, the problem is that i haven't been able to recover the structure of the statement. For example:

>>> from fyzz import parse
>>> a=parse("SELECT * WHERE {?x a ?y . {?x a ?z}}")
>>> b=parse("SELECT * WHERE {?x a ?y OPTIONAL {?x a ?z}}")
>>> a.where==b.where
True
>>> a.where
[(SparqlVar('x'), ('', 'a'), SparqlVar('y')), (SparqlVar('x'), ('', 'a'), SparqlVar('y'))]

Is there a way to recover the actual parse tree in fyzz instead of just the triples, or some other tool which would let me do this? RDFLib seems to have had a bison SPARQL parser in the past, but I can't find it in the rdflib or rdfextras.sparql packages.

Thanks

ailnlv
  • 1,779
  • 1
  • 15
  • 29

3 Answers3

5

Another tool is roqet a tool that is packaged within rasqal. It is a command line tool that returns the parsed tree. For instance:

roqet -i laqrs -d structure -n -e "SELECT * WHERE {?x a ?y OPTIONAL {?x a ?z}}"

would output ..

Query:
query verb: SELECT
query bound variables (3): x, y, z
query Group graph pattern[0] {
  sub-graph patterns (2) {
    Basic graph pattern[1] #0 {
      triples {
        triple #0 { triple(variable(x), uri<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, variable(y)) }
      }
    }
    Optional graph pattern[2] #1 {
      sub-graph patterns (1) {
        Basic graph pattern[3] #0 {
          triples {
            triple #0 { triple(variable(x), uri<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, variable(z)) }
          }
        }
      }
    }
  }
}

Looking at your comment in the other answer I don't think this is what yo need. And I don't think you will find an answer looking inside SPARQL parsers. The object (or triple pattern) evaluation in a query happens inside Query Engines that, in well designed systems, is isolated from query parsing.

For instance, in 4store you could look at the 4s-query command with the option -vvv (very verbose) where you would see an output of how the query was executed and how substitutions were performed for each triple pattern evaluation.

Manuel Salvadores
  • 16,287
  • 5
  • 37
  • 56
  • I know they are, what I need is the object tree that is passed to the query engine. Basically what I need to do is check for every subpattern P' of the form (P1 OPTIONAL P2) is that every variable that occurs in P2 and outside P' also occurs in P1. Given that, I need to apply certain rewrite rules to the original pattern, and for that having the object tree would be extremely useful. – ailnlv Aug 08 '11 at 20:34
3

ANTLR has a SPARQL grammar here: http://www.antlr.org/grammar/1200929755392/index.html

ANTLR can generate parsing code for Python to run.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • thanks, but what I need is a bit more complex than that; I need to get the object that is going to be evaluated over the database by the query engine, without shorthands like `;`. This must be already done somewhere, I'd like to avoid the work of preprocessing the parse tree. – ailnlv Aug 08 '11 at 16:42
2

Try using rdflib.plugins.sparql.parser.parseQuery.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30495239) – jtbandes Dec 04 '21 at 07:31