-1

I have two issues:

ISSUE 1

This first issue has decent documentation here on Stack, except that no one else seems to be getting the same results as me, so I thought it would be good for everyone for me to ask it here. When I run the following query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>        
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?person ?commonName ?nationality WHERE {
    ?person a dbpedia-owl:Person ;  
              dbpedia-owl:commonName ?commonName . FILTER(lang(?commonName) = 'en')
    ?person a dbpedia-owl:Person;
              dbpedia-owl:birthDate ?birthDate 
}
LIMIT 30

I get this list of people:

SPARQL RESULTS

Great. Now I try to cut out the duplicates (like Abbas Suan who appears three times in three separate languages- I want to keep the English) I do this:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>        
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?person (SAMPLE(?commonName) as ?commonName) ?birthDate WHERE {
    ?person a dbpedia-owl:Person ;  
              dbpedia-owl:commonName ?commonName . FILTER(lang(?commonName) = 'en')
    ?person a dbpedia-owl:Person;
              dbpedia-owl:birthDate ?birthDate 
}
LIMIT 30

With these results: NEW SPARQL RESULTS

So, it seems to me like I have two completely different lists of people. How can I know that I'm not losing people this way? I am trying to download every single person on wikipedia with certain attributes, which is a nice segway to issue 2.

ISSUE 2

When I write the above code, it works fine for those two attributes. However, when I try to add nationality and knownFor attributes (so we know what they did and where they come from), the code bugs out. Even though all of these attributes are on the same page for Person in the DBPedia structure.

This code shows nothing for the nationality and knownFor fields:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>        
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?person (SAMPLE(?commonName) as ?commonName) ?birthDate ?nationality ?      knownFor WHERE {
    ?person a dbpedia-owl:Person ;  
              dbpedia-owl:commonName ?commonName . FILTER(lang(?commonName) = 'en')
    ?person a dbpedia-owl:Person;
              dbpedia-owl:birthDate ?birthDate .
    OPTIONAL {?person a dbpedia-owl:Person;
              dbpedia-owl:nationality ?nationality .}
    OPTIONAL {?person a dbpedia-owl:Person;
              dbpedia-owl:knownFor ?knownFor .}
}
LIMIT 30

with these results:

SPARQL RESULTS 3

Any help for any of the issues would be very helpful! Thanks

Alex Chumbley
  • 754
  • 3
  • 10
  • 14
  • Please don't post multiple questions as a single question, in future please post each question as a separate question. Also several of your results links are broken. – RobV Jul 15 '13 at 23:55
  • Also you haven't included all the prefix declarations so these queries can't be cut and pasted into arbitrary SPARQL tools. – RobV Jul 15 '13 at 23:57
  • The first query gives an error: `Virtuoso 37000 Error SP030: SPARQL compiler, line 1: Missing in PREFIX declaration at '<' before 'http:' SPARQL query: define sql:big-data-const 0 define input:default-graph-uri PREFIX rdfs: – Joshua Taylor Jul 16 '13 at 02:07

1 Answers1

1

Issue 1

Your first query uses a variable ?nationality that isn't in the query, this is legal SPARQL but the Virtuoso compiler errors on it. As for your second query it is actually illegal SPARQL syntax - you can't assign a variable to itself. Virtuoso is notoriously non-standard in their interpretation of some parts of the SPARQL specification so we'll ignore that.

The only real difference between your queries (other than the spurious SAMPLE()) is that you don't select the same set of variables so the DISTINCT operator may be discarding a different set of rows.

That being said there is no requirement on a SPARQL engine to return results in a consistent order so when you use LIMIT there is absolutely no requirement/guarantee that the engine gives you the same results each time. You can add an ORDER BY clause if you want to force the SPARQL engine to sort the results which should ensure you get the same results each time when using LIMIT but this will make your queries slower.

Issue 2

Adding ?person a dbpedia-owl:Person is technically defunct inside the OPTIONAL though in reality it may improve performance. Are you sure it is present for every person being returned?

It is entirely possible that because of the LIMIT that Virtuoso is favouring solutions for which it does not need to evaluate the OPTIONAL clause and thus saving it work. Removing the OPTIONAL to make these patterns mandatory will tell you whether this is the case or whether the entries just don't have those properties.

For example this example from the results has neither of your optional properties present which suggest the latter is the case.

RobV
  • 28,022
  • 11
  • 77
  • 119
  • I've included all the prefixes that I used, whether or not they are the right ones to use. The code I've been using is pretty hit or miss, with the Virtuoso compiler crashing and saying "This site is under maintenance" half the time. Ignoring the ?nationality (which was just a typo on my end), those were the results I got. Sorry they seemed broken, they work hit or miss for me, along with the complier. The names are still different, though, and I don't know why – Alex Chumbley Jul 16 '13 at 13:34
  • And when I take out the optional command, for the second issue, none of the fields are populate, not even birthDate and commonName. Does that mean that those two props don't exist? – Alex Chumbley Jul 16 '13 at 13:46