0

I just switched from TopBraid to try out Protege.

I have an ontology with some RDF that resembles this:

instances:some_thing1 a semapi:SomeClass ;
                               semapi:hasChainTo (
                                      [ 
                                            a semapi:SomeOtherClass ;
                                            semapi:hasChainTo (
                                                 [ ... ]
                                                 [ ... ]
                                            )
                                      ] 
                              ) .

The idea is that this nested blank nodes syntax works great because the chains get very deep and this syntax is fluid and highly readable and maintainable as the chains may change from time to time and new chains can be added.

Not only that, but I have already wrote queries for the resulting graph.

Problem is, if I import this into Protege and then Save it right back out, the result are reformatted to something like:

   instance:some_thing1 rdf:type semapi:SomeClass ,
                               owl:NamedIndividual ;
                               semapi:hasChainTo [ ] .


   [ rdf:type semapi:SomeClass ;
       semapi:hasChainTo [ ]
   ] .

The resulting RDF completely breaks the querying system as well as the other benefits of using this approach to represent "chaining".

Is there any way I can get around this? If not I may be forced to switch back to TopBraid.

UPDATE: Here is a reproduction of the issue:

I wrote bugTest.ttl then open it in Protege and immediately Save As > Turtle > bugTestOutput.ttl:

https://dl.dropboxusercontent.com/u/13814624/bugTest.ttl https://dl.dropboxusercontent.com/u/13814624/bugTestOutput.ttl

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
parliament
  • 21,544
  • 38
  • 148
  • 238
  • Thank you for providing some samples of the data; these do illustrate the type of problem you're encountering. If everything is as you say, it sounds like either TopBraid or Protégé is losing important information in the saving or reading process, and that's a Bad Thing. Given the severity of the problem, can you produce a minimal example (complete RDF files) that illustrate this problem? Often, trying to reproduce the problem in minimal fashion will reveal a problem on your end that wasn't obvious before, or else serve as a very good sample to send to the appropriate developers. – Joshua Taylor Aug 06 '13 at 02:06
  • Additionally, this seems like the kind of problem that is either in TopBraid or in Protégé, but not both. If there's some RDF serialization of an ontology that Protégé mangles, it should do it regardless of what application created it. Similarly, if TopBraid is producing garbage, it should do it regardless of who will consume it. – Joshua Taylor Aug 06 '13 at 02:08
  • Thanks for the reply. I'll try to reproduce the problem bits now. – parliament Aug 06 '13 at 02:11
  • @JoshuaTaylor I updated OP with reproduction files. Thanks for looking into this. – parliament Aug 06 '13 at 02:31

1 Answers1

1

In short, your ontology is not a valid OWL ontology, and Protégé is following the “garbage in, garbage out” principle. Since the some bad data is coming in (though Protégé does try to salvage it), you get bad data out (actually, just the salvaged data). You can validate an ontology with the Manchester OWL Validator, but you'll need to select the OWL 2 DL profile to get the appropriate diagnostics. On your document, the output is:

The ontology and/or one of its imports is NOT in the OWL 2 DL profile

Imports Closure

Ontology IRI                                         Physical URI
OntologyID(OntologyIRI(<http://ideation.io/semapi>))

Detailed report

Use of reserved vocabulary for class IRI

SubClassOf(semapi:BaseClass rdfs:Class)

Use of undeclared class

SubClassOf(semapi:BaseClass rdfs:Class)

Aside from the fact that you have a triple:

<http://ideation.io/semapi>
      a       owl:Ontology .

in the first file, this doesn't appear to be an OWL ontology at all. E.g.,

semapi:BaseClass a rdfs:Class; 
                 rdfs:subClassOf rdfs:Class .

is defining some classes that could be used in an RDFS vocabulary, but it doesn't declare any owl:Classes. When you do something like

semapi:hasChainTo a owl:ObjectProperty; 
                  rdfs:domain semapi:BaseClass;
                  rdfs:range  semapi:BaseClass .

You've got an owl:ObjectProperty that's going to be relating semapi:BaseClasses, each of which is also an rdfs:Class, so you've got an object property that's going to be relating rdfs:Classes, but in OWL DL, object properties can only relate individuals. Where you start using RDF lists, i.e., in:

instances:Instance1 a semapi:DerivedClass;
                        semapi:hasChainTo (
                            [
                                a semapi:DerivedClass;
                                semapi:hasChainTo (
...

you're using an RDF list as the object in an object property assertion. RDF lists can't be used in OWL DL, however, because they're also used in the RDF serialization of OWL. It would seem, then, that Protégé is discarding a bunch of information that isn't meaningful to it as the RDF serialization of an OWL ontology. One might be able to argue that when Protégé doesn't know what do with some RDF that's coming in, that it should preserve it, but that's really an untenable position when RDF is just one possible serialization of the serialized thing (an OWL ontology) that Protégé is concerned with.

Pellet's lint tool produces a number of warnings:

[Untyped classes]
- http://ideation.io/semapi#DerivedClass
- http://ideation.io/semapi#BaseClass
- http://www.w3.org/2000/01/rdf-schema#Class

[Untyped individuals]
- 6 BNode(s)

[Using rdfs:Class instead of owl:Class]
- http://ideation.io/semapi#DerivedClass
- http://ideation.io/semapi#BaseClass



=========================================================
OWL 2 DL violations found for ontology <http://ideation.io/semapi>:
Use of undeclared class: <http://ideation.io/semapi#BaseClass> [ObjectPropertyRange(<http://ideation.io/semapi#hasChainTo> <http://ideation.io/semapi#BaseClass>) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid5) in <http://ideation.io/semapi>]
Use of undeclared class: rdfs:Class [SubClassOf(<http://ideation.io/semapi#BaseClass> rdfs:Class) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid11) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#BaseClass> [SubClassOf(<http://ideation.io/semapi#DerivedClass> <http://ideation.io/semapi#BaseClass>) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid9) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#BaseClass> [SubClassOf(<http://ideation.io/semapi#BaseClass> rdfs:Class) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid1) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#BaseClass> [ObjectPropertyDomain(<http://ideation.io/semapi#hasChainTo> <http://ideation.io/semapi#BaseClass>) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid7) in <http://ideation.io/semapi>]
Use of reserved vocabulary for class IRI: rdfs:Class [SubClassOf(<http://ideation.io/semapi#BaseClass> rdfs:Class) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> _:genid3) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [SubClassOf(<http://ideation.io/semapi#DerivedClass> <http://ideation.io/semapi#BaseClass>) in <http://ideation.io/semapi>]
Use of undeclared class: <http://ideation.io/semapi#DerivedClass> [ClassAssertion(<http://ideation.io/semapi#DerivedClass> <http://ideation.io/instances#Instance1>) in <http://ideation.io/semapi>]


No OWL lints found for ontology <http://ideation.io/semapi>.

<http://ideation.io/semapi> does not import other ontologies.
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thank you for clearing this up, it makes a lot of sense. Can you offer any suggestions moving forward? I was planning to use AllegroGraph's RDFS++ reasoner so currently have no compelling reason to even use OWL in the first place. I was actually trying to avoid it but I don't know how to avoid using owl:ObjectProperty for example so I just used it. If I remove the `... a owl:Ontology` header but it still chokes on the list. How can I just move on for now (besides TopBraid) as there's no time for breaking changes for at least a week. – parliament Aug 06 '13 at 17:32
  • 1
    @parliament 1) Is there a reason that you're trying to load things in Protégé in the first place? If you try to pass the document through an OWL-capable tool (e.g., Protégé) you'll probably get mangled results. 2) If you just want to declare that something is a property, then just say that ` rdf:type rdf:Property` (see [rdf:Property](http://www.w3.org/TR/rdf-schema/#ch_property) in the RDF Schema recommendation). – Joshua Taylor Aug 06 '13 at 19:46
  • 1
    @parliament Still, you should note that even if you do `hasChainTo a rdf:Property ; rdfs:domain BaseClass ; rdfs:range BaseClass`, you've got in your instance data the triple `instance1 hasChainTo ( ... )`, which says means that some RDF list will also be inferred to be an instance of `BaseClass`. Is this what you were intending? – Joshua Taylor Aug 06 '13 at 19:49
  • 1
    @parliament Actually, taking another look, I have another question: why are you using the lists at all? It doesn't look like you're using any lists with length greater than one, so if you're actually trying to say `instance hasChainTo [ a NotherClass ; hasChainTo [ a YetAnotherClass ; hasChainTo [ ... ] ] ]`, you can… – Joshua Taylor Aug 06 '13 at 21:49
  • Joshua, this was the minimum example I used to reproduce the issue. In reality the lists have length > 1 and there are indeed many derived classes like you described. In Getting Started (http://protege.stanford.edu/doc/owl/getting-started.html) if you look at step 5 it asks to choose the language profile with options including Owl, Owl DL, and "RDF Schema and Owl". However, I cannot find any trace of selecting this option in Protege. Is there any way to make it "non-OWL" capable? I simply wanted to evaluate a TopBraid alternative; are you saying I cannot work in it if I'm not targeting Owl? – parliament Aug 07 '13 at 06:24
  • I tried changing `owl:ObjectProperty` to `rdf:Property` and removing the `... a owl:Ontology` header. At that point there is no mention of "owl" but it still mangles it (not to mention changing some of the tags tags to `owl:...`. – parliament Aug 07 '13 at 06:28
  • 1
    I think that that documentation is for the 3.x series of Protégé that supported more types of ontologies. (Even though the version is older, it looks like there's still a current release.) The [downloads page](http://protege.stanford.edu/download/registered.html#p3.5) says that “[Protégé Desktop 3.5] supports OWL 1.0, RDF(S), and Frames ontologies.” I'm pretty sure that those screenshots are from the 3.x series. – Joshua Taylor Aug 07 '13 at 11:11
  • 1
    OK, glad to hear that the lists have multiple elements in the real data. Even so, when you have `p rdfs:range C` and `x p y`, it follows that `y rdf:type C`. You've got `x p (C1 C2 ...)` which means that you'll get `(C1 C2 ...) rdf:type C` which says that the _list_ has type `C`, not the elements of the list. I.e., you're not getting `C1 rdf:type C`, `C2 rdf:type C`, and so on. Even if `rdf:type` did “distribute” over lists, since both `Ci` and `C` are classes, it seems like you might want `C1 rdfs:subClassOf C`, `C2 rdfs:subClassOf C`, and so on. – Joshua Taylor Aug 07 '13 at 11:16
  • Thanks for that observation I did not know the type would not distribute to the items in the list. I figured I was covered with ([ a semapi:DerivedClass; ... ]) type declaration inside each item of the list. I've ask a new question this here since this answer is more than good enough. Thanks again http://stackoverflow.com/q/18108667/1267778 – parliament Aug 07 '13 at 16:27