How to make a selection of a giant ontology, built from several aligned reference ontologies?

Question

My organization has an information requirement spanning several information domains. In order to capture this, we are building a large organization ontology in which we align several domain specific reference ontologies / vocabularies (think of dublin core, geosparql, industry specific information models etc) and where necessary, we add concepts in an ` extension' ontology (which is then also aligned with the reference ontologies).

The totality of this aligned ontology (>3000 classes and >10000 ObjectProperties) contains both unused concepts and semantic doubles, and for the newcomer is impossible to navigate. Further more the organization wishes to standardize the use of specific concepts, so doubles are extremely undesirable. We are therefore looking for a way to construct the SuperAwesomeOntology that contains all concepts (and their owl related predicates like subClassOf, domain/range etc) that have been labeled (maybe by something like dcterms:isRequiredBy "SuperAwesomeOntology"). The result should be a correct OWL-ontology that can be stored in a single file.

One constraint: it has to be done programmatically,(the copy/move/delete axioms interface of protege wont do), because if one of the reference ontologies gets an update, we want to be able to render the SuperAwesomeOntology again from its most up-to-date reference ontologies and find out if there are any conflicts.

How would we go about this? could SPARQL do this, how? Alternative suggestions to the isRequiredBy labeling are also welcome.

Is it ontology integration? Can you assume that there are no conflicts between the domain ontologies? — UninformedUser, Sep 25 '18 at 13:19
is it ontology integration: yes. As to conflicts, can you clarify? if you mean that there will be no logical inconsistencies as the result of the alignment, then yes: lets assume so. — Joep van Genuchten, Sep 25 '18 at 13:24
there are great tools like [LogMap](https://www.cs.ox.ac.uk/isg/projects/LogMap/) if you're looking for a Java based tool. And indeed, this is still ongoing resource, a good entry point would be [this](http://oaei.ontologymatching.org/) — UninformedUser, Sep 25 '18 at 13:27
The links you shared are interesting, however, they seem to be about the effort of (automated) alignment. My question is about how to make a selection of the total alignment, once the alignment has been made (so how do you remove concepts with the same meaning and elements of the reference ontologies that you dont need). — Joep van Genuchten, Sep 25 '18 at 13:41
I guess I'm simply too stupid to understand the whole question. A minimal example with input/output/workflow/whatever could help. Or you just wait for others here that are much smarter than me. — UninformedUser, Sep 25 '18 at 19:12
I'm not sure if this is what you're asking, but once you had identified equivalent objects, you can programmatically merge using the OWLAPI (e.g. OWLEntityRenamer). This could also be done with SPARQL (but more complex). If I am on the right lines I can provide examples.Of course, the assumption is that the alignment tasks produce true equivalents, otherwise you end up over-collapsing. — Chris Mungall, Sep 26 '18 at 01:37
That is pretty close and at least part of the solution! the thing is, when you reuse ontologies (that have been developed by 3rd parties), you end up with many classes/objectproperties/dataproperties etc that you dont use (simply because you just dont have that information in your systems) im also looking for a way to get rid of all those unused concepts, such that if somebody asks ` what information is in system A' I can simply give them an ontology that describes exactly (and only) the information content of that system, expressed in terms of reference ontologies. — Joep van Genuchten, Sep 26 '18 at 15:10

Konrad Höffner · Accepted Answer · 2018-10-05T13:30:21.240

If I understand you correctly, you want to programmatically remove unused concepts from a large ontology or a collection of ontologies/graphs and you also want to remove concepts/classes that you identified as duplicates via interlinking.

Identified duplicates are easy to remove:

Define what a duplicate is for you. For example, nodes at either end of a owl:sameAs or skos:closeMatch link that are outside of your core graph (so you don't remove the "original").

Construct the new graph using a SPARQL query:

construct {?s ?p ?o.}
{
 ?s ?p ?o.
 filter not exists {graph ?g {?s owl:sameAs ?x.} filter(?g!=<http://my.core.graph>)}
 filter not exists {graph ?g {?o owl:sameAs ?x.} filter(?g!=<http://my.core.graph>)}
}

I tested this query for syntax and performance but not for correctness.

Unused concepts are more difficult to remove:

First, again, you need to define, what "unused" means for you. This criterion will however certainly involve reachability, or "connectedness" in the combined graph, where you want to only select the graph component that contains your core ontology. The problem is that, if you treat the triples as undirected edges, you will probably get a connected graph (that is only a single component and no nodes to remove) because the type hierarchy often connects everything. You could take the direction of edges into account, that is include only resources Y where there is a directed path from any resource X in your core ontology to Y. This would ensure that you can go up the subclass hierarchy of the target ontology until e.g. owl:Thing but not down again. The problem is that you don't know what other type of edges are in the target ontology and in which direction they go but you could use only rdfs:subClassOf edges for now.

If you have sufficiently defined your "unused concept" or want to try it with some experimental definition, you can either use a graph library or graph analysis application and import your code there.

Here is an example of how to import a SPARQL endpoint into the Cytoscape.js JavaScript graph visualization library, it can be used in node as well. You need to heavily adapt the code however.

Or you do it again in SPARQL using SPARQL 1.1 property paths. The problem is that those can have a large performance impact (or even a complexity that is way too large to ever complete) especially when applied to a large number of resources and an unrestricted path length. So it is possible that a query such as that times out but feel free to try and adapt it:

construct {?s ?p ?o.}
{
 {?s ?p ?o.}
 graph <http://my.core.graph> {?x rdfs:subClassOf ?X.}
 {?x (<>|!<>)* ?s.}
}

The ?x rdfs:subClassOf ?X statement is just an identifier for which resources of your core ontology you want to use a source points, I couldn't get a valid query without that. When I apply a graph statement to the path expression, I get a syntax error from Virtuoso.

How to make a selection of a giant ontology, built from several aligned reference ontologies?

1 Answers1