Prevent timeout while querying Dbpedia endpoint using Apache Jena

Question

I'm using Apache Jena to fetch a huge amount of data from Dbpedia and write it into a CSV file. However, I'm only able to get about 10,000 triples and not the entire data. I need it to fetch all triples in the query. I can't identify whether it is an endpoint timeout or something else. The code I've written is as follows:

public class FetchCountriesData {

    public void getCountriesInformation() throws FileNotFoundException {
        ParameterizedSparqlString qs = new ParameterizedSparqlString("PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n "
                + "SELECT * {     ?Subject rdf:type <http://dbpedia.org/ontology/Country> .     ?Subject ?Predicate ?Object } ORDER BY ?Subject ");

        QueryExecution exec = QueryExecutionFactory.sparqlService("https://dbpedia.org/sparql", qs.asQuery());
        //exec.setTimeout(10000000);
        exec.setTimeout(10, TimeUnit.MINUTES);
        ResultSet results = exec.execSelect();
        ResultSetFormatter.outputAsCSV(new FileOutputStream(new File("C:/fakepath/CountryData.csv")), results);
        ResultSetFormatter.out(results);
    }
}

Answered here several times. DBpedia is a public service and the size of the resultset limited to 10000. This is done to ensure fairness among all users. You can use `ORDER BY ?Subject LIMIT 10000 + OFFSET n` and do some kind of pagination (`n` is a multiple of 10000). The better way would be to load the data manually and process it with your own triple store resp. resources. — UninformedUser, Jan 10 '17 at 16:11
Or... run your own [DBpedia mirror in the cloud](http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtPayAsYouGoEBSBackedAMIDBpedia2015). — TallTed, Jan 10 '17 at 21:41

score 1 · Answer 1 · answered Jan 10 '17 at 14:09

1

You are almost certainly hitting one of DBPedias limits. For further information see http://wiki.dbpedia.org/OnlineAccess and http://lists.w3.org/Archives/Public/public-lod/2011Aug/0028.html

answered Jan 10 '17 at 14:09

chrisis

1,983
5
20
17

Prevent timeout while querying Dbpedia endpoint using Apache Jena

1 Answers1