1

I have a problem with RDF4J SHACL validation engine.

This is my code where I use shacl rules and example payload files as TTL input:

String SHAPES = "rules.ttl";
String DATA = "input.ttl";

ShaclSail shaclSail = new ShaclSail(new MemoryStore());
Repository repo = new SailRepository(shaclSail);

try (RepositoryConnection connection = repo.getConnection()) {
    connection.begin();
    connection.add(new StringReader(Files.readString(Path.of(SHAPES))), RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();

    connection.begin();
    connection.add(new StringReader(Files.readString(Path.of(DATA))), RDFFormat.TURTLE);
    connection.commit();

    connection.begin();
    connection.clear(RDF4J.SHACL_SHAPE_GRAPH);
    connection.commit();

} catch (Exception e) {
    Throwable cause = e.getCause();
    if (cause instanceof ValidationException) {
        Model validationReportModel = ((ValidationException) cause).validationReportAsModel();
        Rio.write(validationReportModel, System.out, RDFFormat.TURTLE);
    }
}

The shacl rules file looks like following:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix shacl: <http://www.w3.org/ns/shacl#> .
​
<http://example.com/Property>
  a owl:DatatypeProperty, rdf:Property  .
​
<http://example.com/Class>
  a owl:Class .
​
<http://example.com/ClassShape>
  a <http://www.w3.org/ns/shacl#NodeShape> ;
  shacl:property <http://example.com/PropertyShape> ;
  shacl:targetClass <http://example.com/Class> .
​
<http://example.com/PropertyShape>
  a shacl:PropertyShape ;
  shacl:datatype xsd:integer ;
  shacl:maxCount 1 ;
  shacl:minCount 1 ;
  shacl:path <http://example.com/Property> .

and the input file:

<http://example.com/1>
  a <http://example.com/Class> ;
  <http://example.com/Property> "test" .

The above snippet returns a proper datatype violation results as <http://example.com/Property> "test" . is not an integer as defined in the rules: shacl:datatype xsd:integer ;. This is a behaviour that I expect.

But when I use exactly the same data but in JSON-LD format:

...
connection.add(new StringReader(Files.readString(Path.of(SHAPES))), RDFFormat.JSONLD, RDF4J.SHACL_SHAPE_GRAPH);
...
connection.add(new StringReader(Files.readString(Path.of(DATA))), RDFFormat.JSONLD);
...

with this rules file:

{
  "@context": {
    "id": "@id",
    "type": "@type",
    "datatype": {
      "@id": "http://www.w3.org/ns/shacl#datatype",
      "@type": "@id"
    },
    "path": {
      "@id": "http://www.w3.org/ns/shacl#path",
      "@type": "@id"
    },
    "class": {
      "@id": "http://www.w3.org/ns/shacl#class",
      "@type": "@id"
    },
    "property": {
      "@id": "http://www.w3.org/ns/shacl#property",
      "@type": "@id",
      "@container": "@set"
    },
    "targetClass": {
      "@id": "http://www.w3.org/ns/shacl#targetClass",
      "@type": "@id"
    },
    "maxCount": "http://www.w3.org/ns/shacl#maxCount",
    "minCount": "http://www.w3.org/ns/shacl#minCount",
    "PropertyShape": "http://www.w3.org/ns/shacl#PropertyShape",
    "Property": "http://www.w3.org/1999/02/22-rdf-syntax-ns#Property",
    "NodeShape": "http://www.w3.org/ns/shacl#NodeShape",
    "Class": "http://www.w3.org/2002/07/owl#Class",
    "DatatypeProperty": "http://www.w3.org/2002/07/owl#DatatypeProperty"
  },
  "@graph": [
    {
      "id": "http://example.com/Property",
      "type": [
        "DatatypeProperty",
        "Property"
      ]
    },
    {
      "id": "http://example.com/Class",
      "type": "Class"
    },
    {
      "id": "http://example.com/ClassShape",
      "type": "NodeShape",
      "property": [
        "http://example.com/PropertyShape"
      ],
      "targetClass": "http://example.com/Class"
    },
    {
      "id": "http://example.com/PropertyShape",
      "type": "PropertyShape",
      "datatype": "http://www.w3.org/2001/XMLSchema#integer",
      "maxCount": 1,
      "minCount": 1,
      "path": "http://example.com/Property"
    }
  ]
}

and content file:

{
    "@context": {
        "Class": "http://example.com/Class",
        "property": {
            "@id": "http://example.com/Property",
            "@type": "http://www.w3.org/2001/XMLSchema#integer"
        }
    },
    "@graph": {
        "@type": "Class",
        "@id": "http://example.com/1",
        "property": "test"
    }
}

I get no violations which is very surprising as other SHACL engines line Apache Jena or online https://shacl.org/playground/ are returning proper datatype violation.

I get datatype violation when I remove the property type from the @context:

"@type": "http://www.w3.org/2001/XMLSchema#integer"

or change the property to an ObjectProperty:

"property": {
  "@id": "http://example.com/property1"
  ...
}

but there are no datatype violations when I for example use string instead of an integer.

Is this a bug in RDF4J or expected behaviour?

Krzysztof Majewski
  • 2,494
  • 4
  • 27
  • 51

1 Answers1

3

What you are observing is an effect of Type Coercion in JSON-LD. It seems that some parsers (even the EasyRDF one) perform an additional step when accepting a typed value, if the type is a known numeric type ‒ they attempt to parse it as an actual number and use the result in the output, disregarding the original lexical value. Try parsing "property": " 100.0 " ‒ it will trim the spaces and remove .0 before converting it to an integer.

This parsing may obviously fail, but the parser may not be written correctly to accommodate for this, so in the case of EasyRDF, this is the result of "property": "test":

<http://example.com/1>
  a <http://example.com/Class> ;
  <http://example.com/Property> 0 .

I assume that the parser used by the SHACL Playground is a bit more intelligent, and could treat it as the following:

<http://example.com/1>
  a <http://example.com/Class> ;
  <http://example.com/Property> "test"^^<http://www.w3.org/2001/XMLSchema#integer> .

Note that this is what you actually specified in the JSON-LD: by saying that the property has type xsd:integer, it could never be a string if you don't use a value object. The result should not only be invalid according to the SHACL rules, but also contradictory in RDFS and OWL.


I don't have a clear solution, since I don't know if the RDF4J JSON-LD parser could be configured to behave differently. However, these are the options I can think of:

  • Don't use "@type": "http://www.w3.org/2001/XMLSchema#integer". The producer of JSON-LD will have to be explicit about the type of the value, and even something like "10" will not pass ‒ it will have to be 10 as a number in JSON. This could still be circumvented with a value object (with @type).
  • Don't disregard the parsed JSON-LD data; save it alongside the input, in order to have something that is valid in all SHACL processors (but may not be what the producer of JSON-LD intended).
  • Don't use RDFFormat.JSONLD or use a different parser first, before giving the data to the SHACL validator.
  • Require that http://example.com/Property be non-zero in SHACL, assuming that 0 is indeed the value appearing there.

Side note: the JSON-LD parser used by the SHACL Playground is definitely not perfect though, as it also parses "10.5" or "10whatever" as 10. Seems like it uses JavaScript's parseInt or similar.

IS4
  • 11,945
  • 2
  • 47
  • 86