1

I have the following Data & Shape Graph.

@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .

schema:SchemaShape
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:prefixes hr: ;
        sh:select """
            SELECT ?this
            WHERE {
                ?this ?p ?o .
            }
            """ ;
    ] ; 

    sh:property [                
        sh:path rdf:type ;
        sh:nodeKind sh:IRI ;
        sh:hasValue rdfs:Class
    ] ; 
.

Using pySHACL:

import rdflib

from pyshacl import validate

full_graph = open( "/Users/jamesh/jigsaw/shacl_work/data_graph.ttl", "r" ).read()

g = rdflib.Graph().parse( data = full_graph, format = 'turtle' )

report = validate( g, inference='rdfs', abort_on_error = False, meta_shacl = False, debug = False )
print( report[2] )

What I think should happen is the SPARQL based target should select every subject in the Data Graph and then verify that there is a path of rdf:type which has a value of rdfs:Class.

I get the following result:

Validation Report
Conforms: True

The expected validation errors should include only the following subjects:

| <http://learningsparql.com/ns/humanResources#BadOne>         |
| <http://learningsparql.com/ns/humanResources#BadTwo>         |
| <http://learningsparql.com/ns/humanResources#BadThree>       |
| <http://learningsparql.com/ns/humanResources#AnotherName>    |
| <http://learningsparql.com/ns/humanResources#name>           |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> |

Is this possible with SHACL? If so, what should the shape file be?

James Hudson
  • 844
  • 6
  • 19
  • SPARQL based target needs enabling advanced features as this isn't part of SHACL core. Try to add `advanced = True` to the validate method. At least this is one of the issues, not sure about your shape in general. – UninformedUser Apr 13 '20 at 18:21
  • That did help in that I now get validation errors. However, it is generating a validation error for hr:Another because it is not of type rdfs:Class. But it should not be a validation error because it is a subclass of hr:Employee which is of type rdfs:Class. I am also getting strange validation errors on ```sh:focusNode "some comment about missing" ;``` and others which shouldn't be there. I can do this validation with straight SPARQL and would love to know how to do it with SHACL. – James Hudson Apr 13 '20 at 18:51
  • Placed the current validations errors at https://gist.github.com/James-Hudson3010/e67e3766ccb828d51fa9a11849db7d2a – James Hudson Apr 13 '20 at 18:52
  • I still do not understand your data modeling, but if you want to follow arbitary paths, you also have to state this in the SHACL shape, so use `sh:path ( rdf:type [ sh:zeroOrMorePath rdf:type ] )` – UninformedUser Apr 14 '20 at 09:19
  • hr:YetAnother is a subclass of hr:Another which is a subclass of hr:Employee which is a subclass of rdfs:Class. They should all validate. Everything else should not because there is no rdf:type property path to rdfs:Class. What should the shape file look like to report the subjects which do not validate? I will take a look at sh:zeroOrMorePath. Thank you. – James Hudson Apr 14 '20 at 11:42
  • While it seems clear that sh:zeroOrMorePath is going to be part of the solution, I am still getting weird validation errors on objects like ```sh:focusNode "some comment about missing" ;``` and even a validation error on my SPARQL target query among other strange ones. The latest set of validation errors along with the Data & Shape graph is at are https://gist.github.com/James-Hudson3010/b6383ce102a188358fef1177555ad781 – James Hudson Apr 14 '20 at 11:53
  • *"hr:YetAnother is a subclass of hr:Another which is a subclass of hr:Employee which is a subclass of rdfs:Class."* - and exactly this is not true according your data. You have a chain of `rdf:type` properties but never use `rdf:subClassOf` - and this is weird from common modeling point of view - but I don't care, mabye it's just how you need it in your data. The only thing to keep in mind. This won't work with inference as expected, at least non of the RDFS rules would consider paths of `rdf:type` for inference – UninformedUser Apr 14 '20 at 14:04
  • Ok. While I can do what I want with SPARQL alone, it is not possible to do with SHACL? (pretend I said "type of" instead of "subclass of") – James Hudson Apr 14 '20 at 14:26
  • I can find several examples of using rdf:type in the same way I am in the http://schema.org vocabulary. For example, http://schema.org/Ear is a type of http://schema.org/PhysicalExam which is a type of rdfs:Class. – James Hudson Apr 14 '20 at 14:37
  • can you explain what you mean by " I am still getting weird validation errors on objects"? – UninformedUser Apr 14 '20 at 14:50
  • I do not understand why I am getting those validation errors or how to get the validation errors I want. – James Hudson Apr 14 '20 at 14:51
  • You have inference enabled which is clearly leading to more axiomatic triples which lead to more entities which lead to more focus nodes than your raw data contains. Disable it, please. Also, you put shapes and data into the same file - it's obvious that `select * where { ?s ?p ?o}` will return also triples about the shape, isn't it? Split your file file into two and provide both as the appropriate arguments. Note, I did this and got exactly the violations you want. i) I did **not** use inference and ii) split your example into shape and data file. – UninformedUser Apr 14 '20 at 14:54
  • Maybe this helps, if not I don't know and some SHACL experts must help here. Also possible that I misunderstand SHACL and the PySHACL API. I'm neither used to SHACL nor to PySHACL nor to most of the RDF stuff you seem to need in your project – UninformedUser Apr 14 '20 at 14:55
  • I believed my usage of ```sh:prefixes hr: ;``` would restrict the validation to just the hr: triples. I guess it does not work the way I thought it did. I suppose I do not understand what "inference" does either. Regardless, it does help. I can get the results I want by splitting. The reason why I thought they should not be split is due to a misunderstanding of https://github.com/RDFLib/pySHACL/issues/46 – James Hudson Apr 14 '20 at 15:18

1 Answers1

0

What follows results in the expected validation errors, however, there are still several things I do not understand.

  1. The sh:prefixes hr: ; is not needed. It is designed to supply prefixes for the SPARQL target SELECT statement itself and nothing more.

  2. Inference needed to be disabled. It was inserting triples and trying to validate them. In this use case, that is not what is desired. What should be validated is what is in the schema and nothing else.

  3. I was also thinking that it would not be an issue to put everything into a single graph based on what apparently was a misunderstanding of https://github.com/RDFLib/pySHACL/issues/46.

graph_data = """
@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .
"""

shape_data = '''
@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

schema:SchemaShape
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:prefixes hr: ;
        sh:select """
            SELECT ?this
            WHERE {
                ?this ?p ?o .
            }
            """ ;
    ] ; 

    sh:property [                
        sh:path ( rdf:type [ sh:zeroOrMorePath rdf:type ] ) ;
        sh:nodeKind sh:IRI ;
        sh:hasValue rdfs:Class
    ] ; 
.
'''

data  = rdflib.Graph().parse( data = graph_data, format = 'turtle' )
shape = rdflib.Graph().parse( data = shape_data, format = 'turtle' )

report = validate( data, shacl_graph=shape, abort_on_error = False, meta_shacl = False, debug = False, advanced = True )

An alternative using a SPARQL based constraint would look like:

graph_data = """
@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .
"""

shape_data = '''
@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

schema:SchemaShape
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:select """
            SELECT ?this
            WHERE {
                ?this ?p ?o .
            }
            """ ;
    ] ; 

    sh:sparql [ 
        a sh:SPARQLConstraint ; 
        sh:message "Node does not have type rdfs:Class." ; 
        sh:prefixes hr: ; 
        sh:select """ 
            SELECT $this 
            WHERE { 
                $this rdf:type ?o . 

                FILTER NOT EXISTS {
                    ?o rdf:type* rdfs:Class
                }
                FILTER ( strstarts( str( $this ), str( hr: ) ) ) 
            }
            """ ;
    ]
.
'''


data  = rdflib.Graph().parse( data = graph_data, format = 'turtle' )
shape = rdflib.Graph().parse( data = shape_data, format = 'turtle' )

report = validate( data, shacl_graph=shape, abort_on_error = False, meta_shacl = False, debug = False, advanced = True )
James Hudson
  • 844
  • 6
  • 19
  • regarding 1) honestly, from what I understand this is just to define some query local prefixes which you can use in the SPARQL query itself then. But this nowhere mentioned to be a filter on the bindings of `$this` variable. Regarding 2) inference, in your case RDFS inference rules will be applied before validation on the data graph, i.e. according to RDFS rules new triples will be generated and added to the data graph. – UninformedUser Apr 14 '20 at 17:39
  • if you're not happy with it, why aren't you using also a SPARQL based constraint? Instead of the `sh:property` part, try maybe `sh:sparql [ sh:message "Node does not have type rdfs:Class." ; sh:prefixes hr: ; sh:select """ SELECT $this WHERE { $this rdf:type ?o . FILTER NOT EXISTS {?o rdf:type* rdfs:Class} FILTER (strstarts(str($this), str(hr:)))} """ ;` - isn't this close to what you want? – UninformedUser Apr 15 '20 at 06:43
  • I am not sure what advantage the SPARQL based constraint has over the one used in the answer. They both seem to have the same result. – James Hudson Apr 16 '20 at 15:44
  • that is true, I never said something different. My point was that you were struggling with "normal" SHACL constraint but initially said that you're already able to achieve what you need via SPARQL. So my hint was just to show you the opportunity to use SPARQL as constraint as well in addition to selection the target nodes - just for next time if you already have the query. – UninformedUser Apr 17 '20 at 08:26