0

I'm using Apache Jena to fetch a huge amount of data from Dbpedia and write it into a CSV file. However, I'm only able to get about 10,000 triples and not the entire data. I need it to fetch all triples in the query. I can't identify whether it is an endpoint timeout or something else. The code I've written is as follows:

public class FetchCountriesData {

    public void getCountriesInformation() throws FileNotFoundException {
        ParameterizedSparqlString qs = new ParameterizedSparqlString("PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n "
                + "SELECT * {     ?Subject rdf:type <http://dbpedia.org/ontology/Country> .     ?Subject ?Predicate ?Object } ORDER BY ?Subject ");

        QueryExecution exec = QueryExecutionFactory.sparqlService("https://dbpedia.org/sparql", qs.asQuery());
        //exec.setTimeout(10000000);
        exec.setTimeout(10, TimeUnit.MINUTES);
        ResultSet results = exec.execSelect();
        ResultSetFormatter.outputAsCSV(new FileOutputStream(new File("C:/fakepath/CountryData.csv")), results);
        ResultSetFormatter.out(results);
    }
}
TallTed
  • 9,069
  • 2
  • 22
  • 37
  • 2
    Answered here several times. DBpedia is a public service and the size of the resultset limited to 10000. This is done to ensure fairness among all users. You can use `ORDER BY ?Subject LIMIT 10000 + OFFSET n` and do some kind of pagination (`n` is a multiple of 10000). The better way would be to load the data manually and process it with your own triple store resp. resources. – UninformedUser Jan 10 '17 at 16:11
  • 1
    Or... run your own [DBpedia mirror in the cloud](http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtPayAsYouGoEBSBackedAMIDBpedia2015). – TallTed Jan 10 '17 at 21:41

1 Answers1

1

You are almost certainly hitting one of DBPedias limits. For further information see http://wiki.dbpedia.org/OnlineAccess and http://lists.w3.org/Archives/Public/public-lod/2011Aug/0028.html

chrisis
  • 1,983
  • 5
  • 20
  • 17