4

What is a good way to hook a custom SPARQL function into rdflib?

I have been looking around in rdflib for an entry point for custom function. I found no dedicated entry point but found that rdflib.plugins.sparql.CUSTOM_EVALS might be a place to add the custom function.

So far I have made an attempt with the code below. It seems "dirty" to me. I am calling a "hidden" function (_eval) and I am not sure I got all the argument updating correct. Beyond the custom_eval.py example code (which form the basis for my code) I found little other code or documentation about CUSTOM_EVALS.

import rdflib
from rdflib.plugins.sparql.evaluate import evalPart
from rdflib.plugins.sparql.sparql import SPARQLError
from rdflib.plugins.sparql.evalutils import _eval
from rdflib.namespace import Namespace
from rdflib.term import Literal

NAMESPACE = Namespace('//custom/')
LENGTH = rdflib.term.URIRef(NAMESPACE + 'length')

def customEval(ctx, part):
    """Evaluate custom function."""
    if part.name == 'Extend':
        cs = []
        for c in evalPart(ctx, part.p):
            if hasattr(part.expr, 'iri'):
                # A function
                argument = _eval(part.expr.expr[0], c.forget(ctx, _except=part.expr._vars))
                if part.expr.iri == LENGTH:
                    e = Literal(len(argument))
                else:
                    raise SPARQLError('Unhandled function {}'.format(part.expr.iri))
            else:
                e = _eval(part.expr, c.forget(ctx, _except=part._vars))
                if isinstance(e, SPARQLError):
                    raise e
            cs.append(c.merge({part.var: e}))
        return cs
    raise NotImplementedError()


QUERY = """
PREFIX custom: <%s>

SELECT ?s ?length WHERE {
  BIND("Hello, World" AS ?s)
  BIND(custom:length(?s) AS ?length)
}
""" % (NAMESPACE,)

rdflib.plugins.sparql.CUSTOM_EVALS['exampleEval'] = customEval
for row in rdflib.Graph().query(QUERY):
    print(row)
Finn Årup Nielsen
  • 6,130
  • 1
  • 33
  • 43
  • As an extra note I would like to mention that in other SPARQL the definition of custom functions seems simpler, see, e.g., https://stackoverflow.com/questions/16280758/logarithm-function-in-sparql-query – Finn Årup Nielsen May 15 '17 at 15:14
  • Seems simpler doesn't matter, it was probably a decision by design or added later to RDFLib. If your code works, why not continuing with your project :D But maybe, some RDFLib expert knows more. Have you tried to ask the developers? What about the example `examples/custom_eval.py` implementation? – UninformedUser May 16 '17 at 06:19
  • Regard `examples/custom_eval.py`: My example was actually developed from this python file. – Finn Årup Nielsen May 16 '17 at 14:11
  • I now see that there has been commits and issues at rdflib's GitHub page related to my question: https://github.com/RDFLib/rdflib/pull/723/commits/5634e2a9f7b32dee71a77f4f87e934a6a2f24e36 I see this was done in March 2017 by https://stackoverflow.com/users/1235487/pierre-antoine but still a pull request. – Finn Årup Nielsen May 16 '17 at 14:54
  • Sounds good. Then you could fork the project and apply the pull request on your fork. – UninformedUser May 17 '17 at 05:11

1 Answers1

1

So first off, I want to thank you for showing how you implemented a new SPARQL function.

Secondly, by using your code I was able to create a SPARQL function that evaluates two strings by using the Levenshtein distance. It has been really insightful and I wish to share it for it holds additional documentation that could help other developers creating their own custom SPARQL functions.

# Import needed to introduce new SPARQL function
import rdflib
from rdflib.plugins.sparql.evaluate import evalPart
from rdflib.plugins.sparql.sparql import SPARQLError
from rdflib.plugins.sparql.evalutils import _eval
from rdflib.namespace import Namespace
from rdflib.term import Literal

# Import for custom function calculation
from Levenshtein import distance as levenshtein_distance # python-Levenshtein==0.12.2



def SPARQL_levenshtein(ctx:object, part:object) -> object:
    """
    The first two variables retrieved from a SPARQL-query are compared using the Levenshtein distance.
    The distance value is then stored in Literal object and added to the query results.
    
    Example:

    Query:
        PREFIX custom: //custom/      # Note: this part refereces to the custom function

        SELECT ?label1 ?label2 ?levenshtein WHERE {
          BIND("Hello" AS ?label1)
          BIND("World" AS ?label2)
          BIND(custom:levenshtein(?label1, ?label2) AS ?levenshtein)
        }

    Retrieve:
        ?label1 ?label2

    Calculation:
        levenshtein_distance(?label1, ?label2) =  distance

    Output:
        Save distance in Literal object.

    :param ctx:     <class 'rdflib.plugins.sparql.sparql.QueryContext'>
    :param part:    <class 'rdflib.plugins.sparql.parserutils.CompValue'>
    :return:        <class 'rdflib.plugins.sparql.processor.SPARQLResult'>
    """

    # This part holds basic implementation for adding new functions
    if part.name == 'Extend':
        cs = []

        # Information is retrieved and stored and passed through a generator
        for c in evalPart(ctx, part.p):

            # Checks if the function holds an internationalized resource identifier
            # This will check if any custom functions are added.
            if hasattr(part.expr, 'iri'):

                # From here the real calculations begin.
                # First we get the variable arguments, for example ?label1 and ?label2
                argument1 = str(_eval(part.expr.expr[0], c.forget(ctx, _except=part.expr._vars)))
                argument2 = str(_eval(part.expr.expr[1], c.forget(ctx, _except=part.expr._vars)))

                # Here it checks if it can find our levenshtein IRI (example: //custom/levenshtein)
                # Please note that IRI and URI are almost the same.
                # Earlier this has been defined with the following:
                    # namespace = Namespace('//custom/')
                    # levenshtein = rdflib.term.URIRef(namespace + 'levenshtein')

                if part.expr.iri == levenshtein:

                    # After finding the correct path for the custom SPARQL function the evaluation can begin.
                    # Here the levenshtein distance is calculated using ?label1 and ?label2 and stored as an Literal object.
                    # This object is than stored as an output value of the SPARQL-query (example: ?levenshtein)
                    evaluation = Literal(levenshtein_distance(argument1, argument2))


    # Standard error handling and return statements
                else:
                    raise SPARQLError('Unhandled function {}'.format(part.expr.iri))
            else:
                evaluation = _eval(part.expr, c.forget(ctx, _except=part._vars))
                if isinstance(evaluation, SPARQLError):
                    raise evaluation
            cs.append(c.merge({part.var: evaluation}))
        return cs
    raise NotImplementedError()


namespace = Namespace('//custom/')
levenshtein = rdflib.term.URIRef(namespace + 'levenshtein')


query = """
PREFIX custom: <%s>

SELECT ?label1 ?label2 ?levenshtein WHERE {
  BIND("Hello" AS ?label1)
  BIND("World" AS ?label2)
  BIND(custom:levenshtein(?label1, ?label2) AS ?levenshtein)
}
""" % (namespace,)

# Save custom function in custom evaluation dictionary.
rdflib.plugins.sparql.CUSTOM_EVALS['SPARQL_levenshtein'] = SPARQL_levenshtein


for row in rdflib.Graph().query(query):
    print(row)

To answer your question: "What is a good way to hook a custom SPARQL function into rdflib?

Currently I'm developing a class that handles RDF data and I believe it might be best to implement the following code in to __init__function.

For example:

class ClassName():
    """DOCSTRING"""

    def __init__(self):
        """DOCSTRING"""
        # Save custom function in custom evaluation dictionary.
        rdflib.plugins.sparql.CUSTOM_EVALS['SPARQL_levenshtein'] = SPARQL_levenshtein 

Please note, this SPARQL function will only work for the endpoint on which it is implemented. Even though the SPARQL syntax in the query is correct, it is not possible applying the function in SPARQL-queries used for databases like DBPedia. The DBPedia endpoint does not support this custom function (yet).