1

Is there a way using rdflib or a similar package to validate a set of elements?

e.g.

from rdflib import Graph, Namespace, Literal
from rdflib.namespace import DCTERMS

n = Namespace("http://example.org/books/")
n.book

g = Graph()
g.bind("dc", DCTERMS)
g.bind("ex", n)
g.add((n["book"], DCTERMS["title"], Literal("Example Title"))) # Valid
g.add((n["book"], DCTERMS["tite"], Literal("Example Title"))) # Invalid

Or how it would look as a .ttl file:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ex: <http://example.org/books/>

ex:book dc:title "Example Title" . # Valid
ex:book dc:tite "Example Title" . # Invalid

There's a good chance I'm approaching this from the wrong angle entirely so any help is appreciated.

jarthur
  • 393
  • 1
  • 2
  • 18
  • why is the second triple "invalid"? Because of the typo? If so, how would a framework know which is correct? I mean, you as a human can easily decide on it, but for `rdflib` those are just URIs. All I can think of is to check if the property is part of the DCTerms vocabulary - which is trivial to check, right? The question is, what about other properties from unknown vocabularies? – UninformedUser Sep 25 '20 at 07:43
  • @UninformedUser - That's kind of what I'm asking. Obviously, `rdflib` can't determine what is valid/invalid on its own. You said one way is to check if the property is part of the DCTerms vocabulary and that this is trivial but I'm actually not sure how to do this easily (very new to this space). And yes, my next question would be a way to do this for custom namespaces. I thought there might be some 'whitelist' of properties we could pass in to `rdflib` but I don't think this is going to be the case. – jarthur Sep 27 '20 at 22:40
  • No, that's not part of `rdflib` - I also think this would be useful. One could argue, a vocabulary might be closed, but in the end the whole Semantic Web idea is to be open - you can use any namespace for your own URIs - if this makes sense is indeed a different story. – UninformedUser Sep 28 '20 at 05:39
  • 1
    Long story short, I was wrong with the suggestion to iterate. Not all namespaces are closed in `rdflib`: Check the source code here: https://github.com/RDFLib/rdflib/blob/master/rdflib/namespace.py - for example, FOAF is closed so you could simply check if the predicate is among the defined terms, but DCTERMS is not. So, you have do some work in advance and provide your own whitelist. For your custom vocabularies indeed you could create a [`ClosedNamespace`](https://github.com/RDFLib/rdflib/blob/master/rdflib/namespace.py#L175) - this will raise an error if it's not in the pre-defined terms. – UninformedUser Sep 28 '20 at 05:46
  • @UninformedUser Thanks for the information -- especially pointing me towards `ClosedNamespace`. I ended up essentially rolling my own version of it but will now switch to this instead. If you wanted to add this as an answer I'd happily accept it. – jarthur Sep 28 '20 at 06:06
  • 1
    Just so you know: we (maintainers of RDFlib) are planning on changing the way Namespace / ClosedNamespace works in rdflib 6.0.0 to have namespaces indicate a problem when an invalid terms is used but perhaps not raising an error as such. And we will likely add a lot more Closed Namespaces to rdflib - like hundreds. Whenever there's a good chance that there are no new terms being added. We will likely add DCTERMS in this way. – Nicholas Car Sep 30 '20 at 12:55

1 Answers1

2

@UninformedUser commented:

check if the property is part of the DCTerms vocabulary - which is trivial to check

  1. download the vocabulary - either in code or separately
  2. load the vocabulary into another RDFlib Graph
  3. check to see if any property you're using is an rdf:Property in the other graph, as per:
if not (the_property_being_tested, RDF.type, RDF.Property) in graph:

Or, just see if it's a subject in the graph (looser test):

in_graph = False
for s, p, o in g.triples((the_property_being_tested, None, None)):
    in_graph = True
Nicholas Car
  • 1,164
  • 4
  • 7