2

I am using the following query to get wikidata ID from dbpedia page using owl:sameas.

SELECT distinct ?wikidata_concept 
WHERE {<http://dbpedia.org/resource/Category:Michael_Jackson> owl:sameAs ?wikidata_concept 
FILTER(regex(str(?wikidata_concept), "www.wikidata.org" ) )} 
LIMIT 100

It works fine on Virtuoso SPARQL Query Editor. I get http://www.wikidata.org/entity/Q7215695 as the answer which is correct.

However, when I try to do the same using SPARQLWrapper in python, I don't get the above answer (basically the data frame in empty).

My python code is as follows.

import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://live.dbpedia.org/sparql")
item = "http://dbpedia.org/resource/Category:Michael_Jackson"
sparql.setQuery(f"SELECT distinct ?wikidata_concept WHERE {{<{item}> owl:sameAs ?wikidata_concept FILTER(regex(str(?wikidata_concept), \"www.wikidata.org\" ) )}} LIMIT 100")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print(results)
results_df = pd.io.json.json_normalize(results['results']['bindings'])
print(results_df)

Please let me know where I am making things wrong. I am happy to provide more details if needed.

EmJ
  • 4,398
  • 9
  • 44
  • 105
  • 1
    in the code you used DBpedia Live endpoint, the web GUI in your link is the normal DBpedia endpoint ... – UninformedUser Jun 23 '19 at 10:25
  • 1
    I also don't see why you used the category `http://dbpedia.org/resource/Category:Michael_Jackson` instead of the entity itself, i.e. `http://dbpedia.org/resource/Michael_Jackson` which indeed would work – UninformedUser Jun 23 '19 at 10:28
  • @AKSW Thanks a lot for the valuable comments. I tried `http://dbpedia.org/resource/Michael_Jackson` However, it did not work. When I changed to `SPARQLWrapper("http://dbpedia.org/sparql")` it worked. Thanks a lot :) Just wondering what is the difference between `live` and `normal` dbpedia. What do you recommend for industry level applications? Looking forward to hearing from you :) – EmJ Jun 23 '19 at 10:40
  • 1
    The DBpedia Live endpoint is more or less in sync with Wikipedia, i.e. changes to Wikipedia will be reflected in DBpedia Live in a few minutes (or maybe a bit longer). The "normal" DBpedia endpoint contains just a dump of Wikipedia transformed to RDF and is usually outdated. I mean, look at https://wiki.dbpedia.org/develop/datasets - the latest dataset is from 2016 ... not sure if they ever loaded a new dump and forget to upload it to their web page. – UninformedUser Jun 23 '19 at 10:58
  • 1
    using `http://dbpedia.org/resource/Michael_Jackson` also works on DBpedia Live - it's just your filter which doesn't work as they link to different Wikidata URI style (without `www`): See `SELECT distinct ?wikidata_concept WHERE { owl:sameAs ?wikidata_concept } LIMIT 100` – UninformedUser Jun 23 '19 at 11:02
  • @AKSW Thanks a lot. I found the details that you have posted very useful. As you have mentioned, it seems like the problem is with my query. When I ran the command `SELECT distinct ?wikidata_concept WHERE { owl:sameAs ?wikidata_concept } LIMIT 100` It returns all the `owl:sameas`. I only want its wikidata entry. Is there a way to change my query using a diffrent `FILTER`. Looking forward to hearing from you. Thank you very much :) – EmJ Jun 23 '19 at 11:37
  • 1
    Watch the URIs and then look at your filters. I'm sure you will figure out why it doesn't match the regex? And also how to change the filter – UninformedUser Jun 23 '19 at 12:16
  • @AKSW Thanks a lot. Sure, I will spend some time in changing my filter. BTW just curious why DBpedia entities such as `http://dbpedia.org/resource/Word2vec` do not contain any `owl:sameas` in the `live` version. Basically this query does not return anything at all: `SELECT distinct ?wikidata_concept WHERE { owl:sameAs ?wikidata_concept } LIMIT 100` :) – EmJ Jun 23 '19 at 12:21
  • 1
    your last query does return something ... – UninformedUser Jun 23 '19 at 16:07
  • 1
    and regarding the Live version, look at `http://live.dbpedia.org/page/Word2vec` in the browser. There is no such data. And no, I don't know. I'm not responsible for DBpedia nor working for the project. Ask on their mailing list, there is a lot of missing data currently – UninformedUser Jun 23 '19 at 16:09

1 Answers1

1

DBpedia has 2 versions. So, the reason why I got two results is that I am using different versions in the two approaches.

Changing sparql = SPARQLWrapper("http://live.dbpedia.org/sparql") to sparql = SPARQLWrapper("http://dbpedia.org/sparql") solved my issue. So, I am using the same dbpedia version in the query editor and in my python code.

EmJ
  • 4,398
  • 9
  • 44
  • 105