0

I'm currently working with a large XML file, the OpenCyc ontology. (You can download it as opencyc-latest.owl.gz from here: http://sw.opencyc.org/)

This XML file contains lines like these:

<owl:ObjectProperty rdf:about="Mx4rvVi4w5wpEbGdrcN5Y29ycA">
    <rdfs:label xml:lang="en">Arg 3 Genl</rdfs:label>
    <cycAnnot:label xml:lang="en">arg3Genl</cycAnnot:label>
    <!-- [...] -->

    <!-- [Strange lines begin here] -->
    <Mx4rvViAzpwpEbGdrcN5Y29ycA 
      rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
      >M4I</Mx4rvViAzpwpEbGdrcN5Y29ycA>
    <Mx4rv6Bnr5wpEbGdrcN5Y29ycA 
      rdf:datatype="http://www.w3.org/2001/XMLSchema#integer"
      >M4M</Mx4rv6Bnr5wpEbGdrcN5Y29ycA>
    <!-- [Strange lines ended here] -->

    <!-- [...] -->
</owl:ObjectProperty>

Don't worry about the tag names. That's how OpenCyc actually names its tags. I'd rather like to point the attention to their content.

For all not familiar with RDF/XML documents: The rdf:datatype attribute for the two strange lines basically says that the content of the tag should be interpreted as an XML Schema integer.

My questions boil down to: Are M4I and M4M (or other strange values that I found so far like M4E, M4Q, M4E) actually valid XML Schema integers? Or are these errors in the OpenCyc ontology?

If they are actually valid, what is their meaning? And why are they valid after all? (I.e. which documentation should I read to get insights about their meaning?)

C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
Hauke P.
  • 2,695
  • 1
  • 20
  • 43

3 Answers3

3

The literals you're referring to are not valid integers. The representation of those in terms of the XML Schema type sytem, is available online at http://www.w3.org/TR/xmlschema-2/#integer.

It basically says:

integer has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) with an optional leading sign. If the sign is omitted, "+" is assumed. For example: -1, 0, 12678967543233, +100000.

According to the described semantics, your file is invalid.

Petru Gardea
  • 21,373
  • 2
  • 50
  • 62
  • Thanks! Now to the question where those values came from... but that's something for a different place. :S – Hauke P. Mar 21 '15 at 17:01
  • I bet those are some sort of proprietary "things". It reminded me of one proposal re: how to deal with non-standard HTTP headers. One was to use X- as a prefix, the other was to use gibberish, just like you show here. I personally resonated with the logic behind the gibberish... – Petru Gardea Mar 21 '15 at 17:06
2

This is indeed an error in the OpenCyc OWL file. M4I should be 2, and M4M should be 3. We are currently working on a new, updated set of OpenCyc OWL files, and will be sure to correct this. Thank you for reporting it.

0

Using the XML Schema specification, part 2, section 3.3.13.1, I can answer one of your questions: M4I and M4M are not valid instances of xs:integer. I can't answer any of the others.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks. On which definition do you base your answer? Or to put it differently: Where are valid xs:integer representations defined? – Hauke P. Mar 21 '15 at 16:44
  • In the XML Schema specification, part 2, section 3.3.13.1: http://www.w3.org/TR/xmlschema-2/#integer – Michael Kay Mar 22 '15 at 19:13