0

I use f-strings to build SPARQL queries with a variable and they work well as follows:

for label in longnames:
    sparqlquery = f"""
    PREFIX osr:<http://dati.senato.it/osr/>
    SELECT * WHERE {{
        ?uri a osr:Senatore. 
        ?uri rdfs:label "{label}"^^xsd:string.
    }}
    """

which would run N queries with each having a different label value coming from a longnames list[str].

I would like to externalize these SPARQL queries to a separate file and therefore tried to create a sparqlqueries.py file as follows:

DBP_LANGS = """
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX sdo: <https://schema.org/>

    CONSTRUCT {
      ?lang a sdo:Language ;
      sdo:alternateName ?iso6391Code .
    }
    WHERE {
      ?lang a dbo:Language ;
      dbo:iso6391Code ?iso6391Code .
      FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values
    }
    LIMIT {QUERY_LIMIT}
"""

(edit: as noted in the comments that is not an f-string but a simple string literal. Could not use the f-string as the QUERY_LIMIT variable is unknown in that module)

and in my new sample.py code I tried:

from SPARQLWrapper import SPARQLWrapper
from sparqlqueries import DBP_LANGS

sparql = SPARQLWrapper("http://dbpedia.org/sparql")

QUERY_LIMIT = 5
print(DBP_LANGS) # just to debug
sparql.setQuery(DBP_LANGS)

but this SPARQL query fails and as the previous print shows, the QUERY_LIMIT is not interpolated at all in the f-string:

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX sdo: <https://schema.org/>

CONSTRUCT {
  ?lang a sdo:Language ;
  sdo:alternateName ?iso6391Code .
}
WHERE {
  ?lang a dbo:Language ;
  dbo:iso6391Code ?iso6391Code .
  FILTER (STRLEN(?iso6391Code)=2) # to filter out non-valid values
}
LIMIT {QUERY_LIMIT}

and therefore understandably the query fails.

Is there a way to "force" the f-string to be re-interpolated in the current code?

Robert Alexander
  • 875
  • 9
  • 24
  • 8
    Why not just use [`format`](https://docs.python.org/3/library/stdtypes.html#str.format)? i.e. `DBP_LANGS.format(QUERY_LIMIT=QUERY_LIMIT)` – Tomerikoo Feb 06 '23 at 16:28
  • 2
    (I mean, I don't think this can be done with f-strings as they are string ***literals***, which means evaluated at assignment...) – Tomerikoo Feb 06 '23 at 16:33
  • 2
    You shouldn't be using string formatting for SQL queries. Use placeholders in the SQL and parameters in `cursor.execute()`. – Barmar Feb 06 '23 at 16:41
  • 1
    This isn't SQL, though similar arguments may apply to SPARQL. – chepner Feb 06 '23 at 16:43
  • f-string is a particular syntax to use to define `str` values. `DBP_LANGS` is defined using an *ordinary* string literal, and the result is indistinguishable from a `str` value defined using some other method. – chepner Feb 06 '23 at 16:44
  • 1
    It's the `f` prefix that makes a literal an f-string, not the presence of any `{...}` fields. `f'3'` is an f-string that produces a `str` value containing the single character `3`. `'{}'` is regular literal that produces a `str` value containing the pair `{}`. (The latter is suitable for use with the `format` method, e.g. `'{}'.format(3) == '3'`.) – chepner Feb 06 '23 at 16:46
  • OMG lol yes I had forgotten the f prefix to the f-string, sorry. If I do fix this problem and correspondingly double all literal curly braces in the syntax, the module containing it will rightly complain that it doesn't know the QUERY_LIMIT variable – Robert Alexander Feb 06 '23 at 16:48
  • @Tomerikoo thanks but the format line throws a Keyerror error: Exception has occurred: KeyError '\n ?lang a sdo' File "/Users/bob/Documents/work/code/linkloc/src/linkloc/sample1.py", line 8, in test = DBP_LANGS.format(QUERY_LIMIT=QUERY_LIMIT) KeyError: '\n ?lang a sdo' – Robert Alexander Feb 06 '23 at 16:55
  • Because you have more curly braces than placeholders. You will have to escape them (just like you did in the f-string...) – Tomerikoo Feb 06 '23 at 19:10
  • @Tomerikoo correct, thanks! But meanwhile this question was hastily closed as a duplicate (it wasn't since it's about SPARQL not SQL and about an imported string) I stumbled on a simpler/cleaner solution, string Templates :) I just use a ?varname in the simple string and before I use it I declare it as a Template and use .substitute(limit=10) or whatever. It's probably also more readable since the template variable can convey a meaning. Thanks a lot though – Robert Alexander Feb 06 '23 at 19:44
  • 1
    Barmer’s comment was way too soft, and I am afraid you are not paying it the proper attention. **You must not use f-strings or `format` to format SQL queries!** Doing this harbours a security risk that is responsible for a large part of all security vulnerabilities worldwide. To interpolate values into SQL or SQL-like constructs, *only* use the query preparation API provided by the database, nothing else. The database API will take care of properly quoting the values, so that injection attacks are prevented. – Konrad Rudolph Feb 07 '23 at 07:50
  • Hi @KonradRudolph thanks for your input. You are correct in pointing out the dangers of SQL injection but in my case a) this is not SQL and b) the variables are not influenced by any user input. As far as I know the Apache Jena SPARQL server does support your suggestion https://jena.apache.org/documentation/query/parameterized-sparql-strings.html but I don't think there's an advantage in doing my own substitution with my own variable checking. – Robert Alexander Feb 07 '23 at 13:11
  • 1
    @RobertAlexander For the purpose of injection it doesn't matter whether the language is SQL, something similar or even something wildly distinct. And even if the *current* input isn't user supplied it's best practice to follow the same workflow. The advantage of always doing it this way is two-fold. Firstly, doing things the right way is habit-forming and makes it much less likely to accidentally write incorrect code when it *does* matter. And it makes it easier to audit code, because any instance of the pattern you employ here can be rejected automatically without having to analyse details. – Konrad Rudolph Feb 07 '23 at 15:18
  • 1
    And a third bonus reason: there is no downside to doing it "correctly" here. – Konrad Rudolph Feb 07 '23 at 15:18
  • Thanks @KonradRudolph, good points. Will need to study the Apache Jena way of doing it, even though I do see one downside. The templating might be different depending on the "database" engine and therefore the templating mechanism with Jena might break if you then change that component, right? – Robert Alexander Feb 07 '23 at 19:32
  • 1
    @Robert The templating mechanism should abstract the database engine; that is part of why it exists in the first place. I have no specific experience with SPARQL or Apache Jena but for relational databases you have standardised APIs such as ODBC to ensure that changing the underlying database does not break any code (except that DBSes do not implement SQL identically so there are still other issues). However, I don't know to what extent ARQ in particular implements this for SPARQL. – Konrad Rudolph Feb 08 '23 at 08:30
  • @KonradRudolph really appreciate your teachings. Very useful. Thanks a lot. – Robert Alexander Feb 08 '23 at 08:58

0 Answers0