1

I try to get some image links from wikidata by running a SPARQL-query from my local Jena Fuseki instance. I want to merge it with data from my local graph. Unfortunately the query isn't delivering any data but runs and runs instead without any error message.

The sparql-query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?name ?image WHERE { 
  ?s foaf:name ?name.
  ?s owl:sameAs ?wikidata_link.
  FILTER regex(str(?wikidata_link), "wikidata").
  SERVICE <https://query.wikidata.org/sparql> {
    ?wikidata_link wdt:P18 ?image.
  }

} LIMIT 10

The test data I have in my local graph on the Jena Fuseki server:

@base <http://dmt.de/pages> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix wd: <https://www.wikidata.org/entity/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<#john-cage>
    a foaf:Person ;
    foaf:name "John Cage";
    owl:sameAs dbp:John_Cage, wd:Q180727.

<#karlheinz-stockhausen>
    a foaf:Person ;
    foaf:name "Karlheinz Stockhausen";
    owl:sameAs dbp:Karlheinz_Stockhausen, wd:Q154556.

<#arnold-schoenberg>
    a foaf:Person;
    foaf:name "Arnold Schönberg";
    owl:sameAs dbp:Arnold_Schoenberg, wd:Q154770.

I tried a similar query for dbpedia-data which run perfectly.

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dbp: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?name ?dbpedia_link ?birthplace WHERE { 
  ?s foaf:name ?name.
  ?s owl:sameAs ?dbpedia_link.
  FILTER regex(str(?dbpedia_link),"dbpedia.org").
  SERVICE <https://dbpedia.org/sparql> {
    ?dbpedia_link dbo:birthPlace ?birthplace.
  }

} LIMIT 10

Any Ideas? Thanks in advance!

Jan Seipel
  • 117
  • 1
  • 8
  • The main issue is your query in general. SERVICE is evaluated first, i.e. it just retrieves all bindings matching the patterns inside. Well, at least for Fuseki I think SERVICE is evaluated first. SPARQL 1.1 spec is vague here, the only goal is to return the correct result. This is joined afterwards with your outer triple patterns. For Wikidata the number of bindings is 3186880. For DBpedia it works maybe because it has a default limit of 10000 per request. This also means your query might be incomplete, it juts returns a (random) set of 10000 bindings matching some birth places in DBpedia. – UninformedUser Jan 27 '20 at 19:41
  • Probably doesn't answer your question, but that's what I learned from federated querying. And be careful with the DBpedia endpoint, the limit of 10000 is definitely there to ensure fair use among all people using the public service – UninformedUser Jan 27 '20 at 19:46
  • Thx for the hint! Any suggestions how to solve this problem? The query runs for more than an hour now without any answer. This is not usable for me :( – Jan Seipel Jan 27 '20 at 20:39
  • I guess it's just the amount of data that has to be fetched? I tried with Jena CLI tools, i.e. `bin/rsparql --service https://query.wikidata.org/sparql "PREFIX wdt: select * {?wikidata_link wdt:P18 ?image}" > /tmp/res.sparql` to just run the Wikidata part. It leads to a java.lang.OutOfMemoryError with default memory settings. Indeed a huge JSON object has to be parsed and processed. Andy might know a way to solve this. Indeed, increasing Java heap would help – UninformedUser Jan 28 '20 at 07:25
  • Running for an extremely long time can be due to the GC cycling when very close to being out of memory. Changing the heap size can test for this. – AndyS Jan 28 '20 at 09:06

0 Answers0