6

Section 9.4 Datatype Definitions of the OWL 2 Web Ontology Language Structural Specification shows how custom datatypes can be defined, giving the following example:

a:SSN rdf:type rdfs:Datatype .

a:SSN owl:equivalentClass [
  rdf:type rdfs:Datatype ;
  owl:onDatatype xsd:string ;
  owl:withRestrictions (
    [ xsd:pattern "[0-9]{3}-[0-9]{2}-[0-9]{4}" ]
  )
] .

a:hasSSN rdfs:range a:SSN .

So here we’re defining a new datatype a:SSN by restricting the xsd:string datatype via the xsd:pattern facet. So far so good.

But then the specification says something I don’t understand:

The datatypes defined by datatype definition axioms … have empty lexical spaces and therefore they must not occur in literals.

Why would a:SSN have an empty lexical space here? It was defined by constraining the value space of xsd:string via xsd:pattern facet. Section 4.3.4 pattern of XSD 1.1 Part 2: Datatypes says that

pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to ·literals· which match each member of a set of ·regular expressions·.

So we’re constraining the value space of xsd:string, but we’re doing that by constraining the lexical space of xsd:string (the set of finite-length sequences of zero or more characters … that ·match· the Char production from XML) to literals that match the regular expression. So why does the OWL spec say that the lexical space of a:SSN is empty, rather than the the set of finite-length sequences of zero or more characters (as defined in XML) that match the regular expression [0-9]{3}-[0-9]{2}-[0-9]{4}?

More pragmatically, the OWL spec says

… there can be no literals of datatype a:SSN.

So does that mean that a:SSN cannot be used as follows?

a:Jane a:hasSSN "123-45-6789"^^a:SSN .

If so, how is one supposed to use the a:SSN datatype? Is the idea that one should write

a:Jane a:hasSSN "123-45-6789"^^xsd:string .

and infer from the declared range of a:hasSSN what the actual datatype is and thus whether value is valid?

Ryan Shaw
  • 123
  • 1
  • 5

3 Answers3

3

Why would a:SSN have an empty lexical space here?

Datatypes and literal values are notoriously difficult to handle in symbolic reasoning. When you have a symbolic logic, such as first order logic or description logics, symbols denote things that are arbitrary elements in arbitrary sets. You don't need to know what the symbols denote in order to perform correct and complete reasoning (for instance, http://dbpedia.org/resource/France may denote anything, as far as a reasoner is concerned, and it is impossible to constrain this IRI to denote a specific thing in FOL or DL).

For literals, it is a completely different story because they are quantified. They denote specific values in specific sets. For instance "10"^^xsd:integer denotes the number “ten” and nothing else. This matters for the reasoner, because it has to understand that this is different from what "10"^^xsd:string denotes, but the same as what "10.0"^^xsd:decimal denotes. This means that however you implement your reasoner, there must a part of the code specifically dedicated to processing the literals with datatype IRI xsd:integer. Thanks to this dedicated code, an OWL reasoner is able to infer:

<s> <o> "10"^^xsd:int .

from:

<s> <o> "10.0"^^xsd:decimal .

If an ontology can introduce new datatype IRIs that can be used on literals, then you don't have the specifically dedicated code for literals with these types. Now, consider the following:

ex:one  a  rdfs:Datatype;
  owl:equivalentClass  [
    a  rdfs:Datatype ;
    owl:onDatatype  xsd:positiveInteger ;
    owl:withRestrictions ( [ xsd:maxInclusive 1 ] )
  ] .

Then, should the following be a well formed literal, given this datatype definition?

"1.0"^^ex:one

You see, "1.0" is in the lexical space of xsd:decimal and maps to the numeric value “one” in this datatype. The value “one” is also part of the value space of xsd:positiveInteger, but "1.0" is not a valid lexical form for an xsd:positiveInteger. You could argue that ex:one must only use the lexical forms of xsd:positiveInteger because it is defined as a restriction of it. But the problem is that you then have a semantic description (a piece of ontology) that defines a syntactic constraint (the way you are allowed to write a literal with a specific datatype IRI). Logicians know that logics that allow one to constrain the syntax with their semantics are devilish.

Because of the OWL 2 specification where the lexical space of ex:one is empty, it is then possible to say that ex:one is the same datatype as ex:oneD defined as follows:

ex:oneD  a  rdfs:Datatype;
  owl:equivalentClass  [
    a  rdfs:Datatype ;
    owl:onDatatype  xsd:decimal ;
    owl:withRestrictions ( [ xsd:minInclusive 1 ] [ xsd:maxInclusive 1 ] )
  ] .

One more remark, though: what I'm saying here is only valid when you consider the OWL 2 Direct Semantics. If you consider the OWL 2 RDF-based semantics, then there are other things to consider. Especially, in the RDF-based semantics, it is not necessarily the case that ex:one is the same as ex:oneD. They may be distinct datatypes that happen to have the same value space.

Regarding your other questions:

So why does the OWL spec say that the lexical space of a:SSN is empty, rather than the the set of finite-length sequences of zero or more characters (as defined in XML) that match the regular expression [0-9]{3}-[0-9]{2}-[0-9]{4}?

Here, you are considering the xsd:string datatype, where the value space and the lexical space are the same. The lexical-to-value mapping is identity. So it looks like there would be a trivial way to allow the datatype IRI to be used on literals. But consider the broader problem as I showed you before.

So does that mean that a:SSN cannot be used as follows?

Exactly.

If so, how is one supposed to use the a:SSN datatype?

You can use datatypes defined in this way as the range of a property, for instance, or in allValuesFrom or someValuesFrom restrictions. However, when it comes to concrete values attached to instances, you have to use the datatypes that are natively supported by OWL 2 reasoners, as you suggest in your last code snippet.

Antoine Zimmermann
  • 5,314
  • 18
  • 36
  • Why does the lexical space have to be defined as empty? What does OWL do when it encounters an unknown/undefined datatype and why can't it do the same for datatypes defined this way? It feels as if OWL is saying here "do not use this datatype at all" despite having a perfect vocabulary to define it. I might want to describe a certain datatype using both OWL and XML Schema but this would make it inconsistent if OWL defines the lexical space to be empty. Or is that just the artifact of the specific semantics, and the "meaning" of the datatype definition is different? – IS4 Apr 04 '23 at 20:12
  • The spec indeed says "do not use this datatype" but not at all. It just forbids you from using the datatype IRI in a literal. But since the custom datatype must be based on a standard datatype, you can always write a literal with a value in the custom datatype by using the datatype IRI of the standard datatype. If I want to refer to the number "one", I can simply write `"1"^^xsd:integer`, or `"1"^^xsd:decimal`, or `"1"^^xsd:int`, or `"1"^^xsd:nonNegativeInteger`, or etc. Why would I use a cryptic IRI `ex:one` instead? However, I can use the custom datatype as the range of a datatype property. – Antoine Zimmermann Apr 06 '23 at 11:36
  • Sure, I get that the datatype can still be used for reasoning, but there are many types outside XSD that are useful for literals, have their specific lexical space or lexical-to-value mapping (such as the `i18n` namespace). The usage of a datatype in a literal is not just about the value, it is also about the *intent* ‒ from a purely OWL/value perspective, `"en"^^xsd:language` is the same as `"en"`, but still the data and the intent is different. – IS4 Apr 06 '23 at 12:19
  • What I was getting at, is whether a datatype described using the OWL vocabulary has "objectively" empty lexical space, or whether it only appears as such when using OWL reasoners. Say I make my own datatype `ex:color` for identifying colors using `rgb(x,y,z)`; the question is whether I am allowed to describe it using OWL without "objectively" implicating that the lexical space is empty. – IS4 Apr 06 '23 at 12:22
0

It seems to me that a:SSN has an empty lexical space in the example because it isn't itself "the set of finite-length sequences of zero or more characters that match the regular expression [0-9]{3}-[0-9]{2}-[0-9]{4}." Rather, that is the definition of a:SSN. The definition itself was made by constraining a datatype (xsd:string) that does not have an empty lexical space, which is why the section on patterns that you cited applies. That is, the example uses a pattern to constrain a datatype with a non-empty lexical space to define a datatype with an empty lexical space. Accordingly, since "there can be no literals of datatype a:SSN," you would have to either infer that "123-45-6789"^^xsd:string is an SSN by the usage of a:hasSSN or by asserting that "123-45-6789"^^xsd:string is an instance of a:SSN.

sjhuskey
  • 71
  • 1
  • 4
  • I’m not sure what you mean by “…it isn't itself ‘the set of finite-length sequences of zero or more characters that match the regular expression [0-9]{3}-[0-9]{2}-[0-9]{4}.’” If a datatype consists of a value space, a lexical space, and facets, I can understand why the value space of `a:SSN` might be undefined here. And I can understand why it doesn’t have facets. But it seems that the lexical space has been defined for `a:SSN` every bit as much as it has for `xsd:string`. – Ryan Shaw Mar 29 '21 at 20:37
  • How would one assert that `"123-45-6789"^^xsd:string` is an instance of `a:SSN` other than via a typed literal? – Ryan Shaw Mar 29 '21 at 20:41
  • In answer (I hope -- it's confusing to me, too) to the first question, the lexical space for `a:SSN` has not been defined. Rather, the lexical space for `xsd:string` within `a:SSN` has been defined inside of a `DatatypeRestriction`. As for the second question, couldn't you do this: `ClassAssertion ( a:SSN a:123-45-6789 )`? I'm not sure why one would want to do that, since `a:hasSSN` is sufficient to allow the inference that the literal "123-45-6789" is an SSN. – sjhuskey Mar 29 '21 at 22:13
  • I took that as meaning "the value space for the new datatype is represented by lexical values that belong to the lexical space for the restricted datatype." I'm not sure if the distinction is important in most use cases. – Ignazio Mar 30 '21 at 11:49
  • Ignazio said more elegantly what I was trying to say. – sjhuskey Mar 30 '21 at 12:57
0

However, when it comes to concrete values attached to instances, you have to use the datatypes that are natively supported by OWL 2 reasoners, as you suggest in your last code snippet.

Else augment the capabilities of the reasoner, for your application. But don't expect these capabilities to be available when data is processed by a 'generic' OWL reasoner.

Simon Cox
  • 1
  • 1