Do I really need OWL reasoning?

Question

I recently asked a question about the feasibility of loading about 10 million statements into a triplestore with OWL reasoning enabled.

That has lead to some StackOverflow comments as well as discussion within my research group about whether or not we really need OWL reasoning.

I'll start with a real-world query that doesn't seem to require OWL reasoning.

"41167-4120-0" is the NDC Code that identifies the commercial drug product "Fexofenadine hydrochloride 180 MG Oral Tablet [Allegra]" in the US.

A slightly modified version of the NDC appears as a label in the drug ontology (specifically file dron-ndc.owl):

http://purl.obolibrary.org/obo/DRON_00604430 rdfs:label "41167412000"

DrON makes the following OWL assertions:

http://purl.obolibrary.org/obo/DRON_00604430 is a packaged drug product 
    and is rdfs:subClass of 
    ( has_proper_part some http://purl.obolibrary.org/obo/DRON_00083688 )

http://purl.obolibrary.org/obo/DRON_00083688 
    rdfs:subClassOf http://purl.obolibrary.org/obo/DRON_00062350

http://purl.obolibrary.org/obo/DRON_00062350 has_proper_part some 
    (scattered molecular aggregate  
    and (is bearer of some active ingredient) 
    and (is bearer of some (mass and 
    (has measurement unit label value milligram) 
    and (has specified value <value> ))) 
    and (has granular part some fexofenadine))

And ChEBI says:

http://purl.obolibrary.org/obo/CHEBI_5050 rdfs:label "fexofenadine"
    subClassOf (has role some anti-allergic agent)

and

http://purl.obolibrary.org/obo/CHEBI_50857 rdfs:label "anti-allergic agent"

So in order to link a NDC code and a therapeutic role, I can write a query like the following

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct 
?ndcval ?packdrugprod ?drugbrand ?brandlab ?drugform  ?api ?apilab ?drugrole
where {
    values ?ndcval {
        "41167412000" 
    }
    ?packdrugprod rdfs:subClassOf ?hasproppart ;
                  rdfs:label ?ndcval .
    ?hasproppart a owl:Restriction ;
                 owl:onProperty <http://www.obofoundry.org/ro/ro.owl#has_proper_part> ;
                 owl:someValuesFrom ?drugbrand .
    ?drugbrand rdfs:subClassOf ?drugform ;
               rdfs:label ?brandlab .
    ?drugform rdfs:subClassOf ?proppart .
    ?proppart a owl:Restriction ;
              owl:onProperty <http://www.obofoundry.org/ro/ro.owl#has_proper_part> ;
              owl:someValuesFrom ?valSource1 .
    ?valSource1 owl:intersectionOf ?intsect1 .
    # scat mol agg
    ?intsect1 rdf:first obo:OBI_0000576 .
    ?intsect1 rdf:rest ?scatmolag .
    ?scatmolag rdf:first ?bearacting .
    ?scatmolag rdf:rest ?intsect3 .
    # bearer of active ingredient
    ?bearacting a owl:Restriction ;
                owl:onProperty obo:BFO_0000053 ;
                owl:someValuesFrom obo:DRON_00000028 .
    ?intsect3 rdf:first ?granpart .
    ?intsect3 rdf:rest ?r .
    # has granular part fexofenadine
    ?granpart a owl:Restriction ;
              owl:onProperty obo:BFO_0000071 ;
              owl:someValuesFrom ?api .
    ?api rdfs:subClassOf ?rolerestr ;
         rdfs:label ?apilab .
    # has anti allergic role
    ?rolerestr a owl:Restriction ;
               owl:onProperty obo:RO_0000087 ;
               owl:someValuesFrom ?drugrole  .
    ?drugrole rdfs:label ?drlab .
    values ?drugrole {
        obo:CHEBI_50857 
    }
}

Concerns:

What about accessing nested subClass relationships without reasoning?

The example above was easy because fexofenadine is directly asserted to have the "anti-allergic" role

What if I am interested in people taking nitrate esters? Nitroglycerin is a nitroglycerol, which is in turn a nitrate ester. If I used a repository with no reasoning enabled, I would have to explicitly use a property path to find patients who are taking any nitrate esters, with a snippet like this (right?)

?s rdfs:subClassOf* <http://purl.obolibrary.org/obo/CHEBI_51080> .

What about inferring the classes to which individuals belong to?

What if my ontology says something like

:ViagraPill owl:equivalentClass ( :pill 
    and (:hasColor some :blue ) 
    and (:hasShape some :diamond))
:steelBlue rdfs:subClassOf :blue

And I have data triples that say something like

:patient1 :consumed :pill1 .
:pill1 :hasColor :steelBlue1 ;
    :hasShape :diamond1 .
:steelBlue1 a :steelBlue .
:diamond1 a :diamond.

And I want to write a query for patients who have consumed Viagra pills:

?patient a :patient ;
    :consumed ?pill .
?pill a :ViagraPill .

I would need some form of OWL reasoning for that, right?

I assume that `:ViagraPill` is a class, right? Then the axiom should be `:ViagraPill rdfs:subClassOf ...` But even then, it wouldn't be enough. To infer that `:pill1` is an instance of class `:ViagraPill` it would need for an equivalence axiom, i.e. `:ViagraPill owl:equivalentClass ...` This is necessary because you want to infer "from right to left". — UninformedUser, Oct 24 '17 at 17:32
But to give you an answer to your question *"I would need some form of OWL reasoning for that, right?"* -> Yes, it would need some kind of OWL reasoning. OWL Horst, OWL RL, something like that. Sure, it depends on other axioms that should be considered. — UninformedUser, Oct 24 '17 at 17:34
Ah, just forgot: `:steelBlue1 a :blue` and `:diamond1 a :diamond` is missing in the data triples. — UninformedUser, Oct 24 '17 at 20:03

score 2 · Answer 1 · answered Sep 18 '20 at 06:43

I always thought the tendency of OBO and other bio, life science and agri ontologies to use millions of classes but very few individuals is a mistake.

The above modeling means that for every instance of Allegra (every single pill, a box or other package) you need to infer statements like "it's a scattered molecular aggregate" and "is bearer of some active ingredient" and "has granular part some fexofenadine". I find this wasteful.

It's better to have those statements attached directly to the drug definition: as simple statements, not as Restrictions. You can do that in two ways:

Still treat Allegra as a class, but attach the props directly to it by using punning
Treat Allegra as an instance, and if you ever need to describe individual pills, use statement like pill dct:type Allegra

Then you can access the pill properties simply through its drug (class or not class):

?pill rdf:type ?drug. # or in Variant2: dct:type
?drug obo:RO_0000087 obo:CHEBI_50857. # has anti allergic role

It'll be similar to your query but quite simpler and faster because it'll avoid Restrictions.

(As for the need to parse rdf:Lists, that must be a heavy burden on the consciousnesses of ontology creators):

    ?intsect1 rdf:first obo:OBI_0000576 .
    ?intsect1 rdf:rest ?scatmolag .
    ?scatmolag rdf:first ?bearacting .
    ?scatmolag rdf:rest ?intsect3 .
    # bearer of active ingredient

Here's your Viagram example in simplified form. I've turned the nomenclature values :blue and :diamond into individuals (skos:Concept) because I see no reason for them to be classes (:steelBlue1 makes no sense to me).

:ViagraPill a DrugForm;
  :hasColor :blue;
  :hasShape :diamond.
:steelBlue a skos:Concept;
  skos:broader :blue.

:patient1 :consumed :pill1.
:pill1 :hasColor :steelBlue; :hasShape :diamond.

The color and shape is a necessary but not sufficient condition for identifying the drug, so ?drugForm below is a possible drug of that pill but not certain:

select ?patient ?drugForm {
  ?patient a :patient; :consumed ?pill.
  ?pill :hasColor ?color; :hasShape ?shape.
  ?drugForm :hasColor ?color1; :hasShape ?shape.
  ?color skos:broaderTransitive? ?color1
}

Here I've used transitive reasoning: the path skos:broaderTransitive? is faster than the path skos:broader*.

Reasoning is not an all-or-nothing affair: you can pick and choose rules that you need from the builtin rulesets. Eg if you include RDFS reasoning then you can simplify:

?x a ?s. ?s rdfs:subClassOf* :CHEBI_51080

to just

?x a :CHEBI_51080

The default builtin ruleset RDFS-Plus-optimized includes RDFS, inverses and transitive. See http://graphdb.ontotext.com/documentation/enterprise/rules-optimisations.html for more advice.

You may object: "Didn't you say you'll attach props directly to drug (classes)? How come you also attach them to :pill1 above?".

I think that's fine: we could declare those props to have domain :DrugIndividual or :DrugForm or :Drug, and interpret them as "observed" for :DrugIndividual but "nominal" or "required" for :DrugForm and :Drug. BTW, I like to declare polymorphic domains using schema:domainIncludes ... instead of rdfs:range [a owl:Class; owl:unionOf (...)].

If you don't want to attach props to drug individuals (instances) then you'd have to use an "unknown class" for the pill, eg like this:

:patient1 :consumed :pill1.
:pill1 a [:hasColor :steelBlue; :hasShape :diamond].

With the respective slight complication in querying:

select ?patient ?drugForm {
  ?patient a :patient; :consumed ?pill.
  ?pill a [:hasColor ?color; :hasShape ?shape].
  ?drugForm :hasColor ?color1; :hasShape ?shape.
  ?color skos:broaderTransitive? ?color1
}

To summarize:

GraphDB does not support OWL DL, it supports QL and RL.
OBO-style ontologies use millions of classes, which prescribes the inference of a whole bunch of props from class Restrictions to individuals; which I find wasteful.

Do I really need OWL reasoning?

I'll start with a real-world query that doesn't seem to require OWL reasoning.

Concerns:

What about accessing nested subClass relationships without reasoning?

What about inferring the classes to which individuals belong to?

1 Answers1

Linked