multiple values getting printed for rdf:type

Question

I have created a class Asset which is a subclass of http://schema.org/CreativeWork Asset has 2 subclasses say Article and Publication

Now i would have instances of only Article or Publication class. Asset is abstract class.

When i print metadata for article or publication i also want to print the type of Asset. In this case it would be Article or Publication.

I run the following query

SELECT ?id ?title ?type
WHERE
{
   ?asset rdf:type Asset ;
          somePrefix:id ?id ;
          somePrefix:title ?title ;
          rdf:type ?type .
}

Now instead of printing type as Article or Publication for every Asset i get multiple values in rdf:type. Example

 id  title                    Type
 1   this is a article        CreativeWork
 1   this is a article        Asset
 1   this is a article        Article
 2   this is a publication    CreativeWork
 2   this is a publication    Asset
 2   this is a publication    Article

I want to somehow print only Article or Publication in the type column

How can i achieve this ?

score 3 · Accepted Answer · edited May 23 '17 at 12:11

As I understand you, your class hierarchy is:

CreativeWork
  Asset
    Article
    Publication

You have a few options.

Getting nothing but Article and Publication

The simplest is to say that you only want to consider values of ?type that are Article and Publication, in which case you can specify this with values:

SELECT ?id ?title ?type
WHERE
{
   values ?type { Article Publication }
   ?asset rdf:type Asset ;
          somePrefix:id ?id ;
          somePrefix:title ?title ;
          rdf:type ?type .
}

This is the most specific thing that you can do, and you are guaranteed that ?type will only be Article or Publication.

Getting everything but CreativeWork and Asset

Of course, you might define other subclasses later, and you might not want to have to add more types to the values block every time you do that. You might consider simply filtering out CreativeWork and Asset, then:

SELECT ?id ?title ?type
WHERE
{
   ?asset rdf:type Asset ;
          somePrefix:id ?id ;
          somePrefix:title ?title ;
          rdf:type ?type .
   filter ( ?type != Asset && ?type != CreativeWork )
}

You can also do that filter with:

filter ( ?type NOT IN ( Asset, CreativeWork ) )

Getting only maximally specific classes

This doesn't make any guarantee about what classes you could have, though, and if you later add subclasses of Article or Publication, e.g., JournalArticle &sqsubseteq; Article, then you'd get results that include both Article and JournalArticle, and you might not want that. What you might want instead, is the "most specific" class for an individual. That is, you want the class C of an individual such that the individual has no other type D &sqsubseteq; C. (Note that the other there is important, since C &sqsubseteq; C.) The general idea is captured in How to get Least common subsumer in ontology using SPARQL Query?, along with some other questions, but it's easy to reproduce the important part here:

SELECT ?id ?title ?type
WHERE
{
   ?asset rdf:type Asset ;
          somePrefix:id ?id ;
          somePrefix:title ?title ;
          rdf:type ?type .

   filter not exists {                    # Don't take ?type as a result if
      ?asset rdf:type ?subtype .          # ?asset has some other ?subtype 
      ?subtype rdfs:subClassOf* ?type .   # that is a subclass of ?type 
      filter ( ?subtype != ?type )        # (other than ?type itself).
   }
}

This will get you the "deepest" class in the hierarchy that an individual has. This still could return multiple results, if your individual is a member of classes such that neither is a subclass of the other. Of course, in that case, you'd probably still be interested in all the results.

This snippet worked for me. `filter ( ?type NOT IN ( Asset, CreativeWork ) ).` — Anubhav, Aug 04 '14 at 15:26

vefthym · Answer 2 · 2014-07-31T16:25:15.157

0

You are getting this output, because your Assets are all Assets and CreativeWorks, and they can also be Articles or Publications. If you only want to print the subclasses of Asset, then you can use the following query to restrict the values of ?type (same as yours with an extra line):

SELECT ?id ?title ?type
WHERE
{
     ?asset rdf:type Asset ;
            somePrefix:id ?id ;
            somePrefix:title ?title ;
            rdf:type ?type .
     ?type rdfs:subClassOf Asset . 
}

where rdfs is the namespace prefix of http://www.w3.org/2000/01/rdf-schema#.
It means that the ?type should be a subclass of Asset only.

The first restriction (?asset rdf:type Asset) is not actually needed, but I leave it for clarity, since you have it in your initial query. You can safely skip it though.

edited Jul 31 '14 at 16:25

answered Jul 31 '14 at 16:19

vefthym

7,422
6
32
58

This won't work is RDFS reasoning is enabled, since Asset is a subclass of itself. If the class hierarchy becomes any deeper, though, it won't work unless RDFS reasoning is enabled, because you'll be limiting the results to the immediate subclasses of Asset. – Joshua Taylor Jul 31 '14 at 17:15
Valid points, but these "ifs" do not hold in the OP. Anyway, your answer covers all these cases (and even more), so you get +1 from me :) – vefthym Jul 31 '14 at 17:36
Thanks for the +1, but I wouldn't be too sure that *these "ifs" do not hold in the OP*. Is OP actually currently asserting that `[] rdf:type CreativeWork, Asset, Article`? It's possible, but most people just assert the most specific type in their data (in this case `[] rdf:type Article`). If OP is doing that, then the CreativeWork and Asset results might be coming from RDFS inference rather than explicit data. – Joshua Taylor Jul 31 '14 at 17:39
Explicit data can be a result of RDFS reasoning (which explains why a resource can be CreativeWork, Asset and Article), and in no dataset is there an infinite number of `owl:Thing rdfs:subClassOf owl:Thing` triples. Just tried that at dbpedia sparql endpoint and it works. We can make assumptions about the OP for hours, but I don't think there is a point. – vefthym Jul 31 '14 at 18:02
I'm sorry, I'm not sure what you mean by "Explicit data can be a result of RDFS reasoning," unless you're pointing out that explicit data may be reiterated in the output of a reasoner. By "explicit data" I mean the data that's actually in the file/ontology/model over which OP is querying. Implicit data, on the other hand, is the logical consequence of that data, typically produced by a reasoner. So, given explicit data `x a A. A subClassOf B`, an RDFS reasoner could materialize the implicit data: `x a B. A subClassOf A. B subClassOf B.` – Joshua Taylor Jul 31 '14 at 18:06
My point was that after adding `?type rdfs:subClassOf Asset`, if an RDFS reasoner is being used, ?type can still be bound to Asset, because an RDFS reasoner will produce the triple `Asset rdfs:subClassOf Asset`. – Joshua Taylor Jul 31 '14 at 18:08
Yes, I got your point and it is valid. What I meant was what you refer to as "implicit data" then, stored in a triple store (what I previously referred to as "dataset"). Then, when you make a SPARQL query, no RDFS reasoner is involved. – vefthym Jul 31 '14 at 18:12

multiple values getting printed for rdf:type

2 Answers2

Getting nothing but Article and Publication

Getting everything but CreativeWork and Asset

Getting only maximally specific classes