0

I did write a Java application using Sesame (RDF4j) API to test the availability of >700 SPARQL endpoints, but it takes hours to complete, so I'm trying to distribute this application using Hadoop/MapReduce framwork.

The problem now is that, in the mapper class, the method that should test the availability didn't work, I think that couldn't connect to the endpoint.

Here the code I used:

public class DMap extends Mapper<LongWritable, Text, Text, Text> {

protected boolean isActive(String sourceURL)
        throws RepositoryException, MalformedQueryException, QueryEvaluationException {
    boolean t = true;
    SPARQLRepository repo = new SPARQLRepository(sourceURL);
    repo.initialize();
    RepositoryConnection con = repo.getConnection();
    TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, "SELECT * WHERE{ ?s ?p ?o . } LIMIT 1");
    tupleQuery.setMaxExecutionTime(120);
    TupleQueryResult result = tupleQuery.evaluate();
    if (!result.hasNext()) {
        t = false;
    }
    con.close();
    result.close();
    repo.shutDown();
    return t;
}

public void map(LongWritable key, Text value, Context context) throws InterruptedException, IOException {
    String src = value.toString();
    String val = "null";
    try {
        boolean b = isActive(src); 
        if (b) {
            val = "active";
        } else {
            val = "inactive";
        }
    } catch (MalformedQueryException e) {
        e.printStackTrace();
    } catch (RepositoryException e) {
        e.printStackTrace();
    } catch (QueryEvaluationException e) {
        e.printStackTrace();
    }
    context.write(new Text(src), new Text(val));
}
}

The input is a TextInputFormat and it looks like this:
http://visualdataweb.infor.uva.es/sparql
...

The output is a TextOutputFormat and I'm getting this:
http://visualdataweb.infor.uva.es/sparql null
...

Edit1: as suggested by @james-leigh and @ChristophE I used try-with-resource statements but no results yet:

public class DMap extends Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value, Context context) throws InterruptedException, IOException {
        String src = value.toString(), val = "";
        SPARQLRepository repo = new SPARQLRepository(src);
        repo.initialize();
        try (RepositoryConnection con = repo.getConnection()) {
            TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, "SELECT * WHERE { ?s ?p ?o . } LIMIT 1");
            tupleQuery.setMaxExecutionTime(120);
            try (TupleQueryResult result = tupleQuery.evaluate()) {
                if (!result.hasNext()) {
                    val = "inactive";
                } else {
                    val = "active";
                }
            }

        }
        repo.shutDown();
        context.write(new Text(src), new Text(val));

    }

}  

Thanks

S. Oued
  • 13
  • 1
  • 4

1 Answers1

1

Use try-with-resource statements. SPRAQLRepository uses background threads that must be cleaned up properly.

  • Sorry, this is my first time reading about try-resource statments . I will read this and see what I can do. Thanks a lot – S. Oued Aug 25 '17 at 15:09
  • Hi there, sorry but I'm not getting this right, the SPARQLRepository does not implement AutoCloseable, so I tried with the RepositoryConnection and TupleQueryResult, it said that they don't implement AutoCloseable. But they extend it. – S. Oued Aug 25 '17 at 19:19
  • Hi @ChristophE, I did migrate to the latest RDF4J API 2.2 and used try-with-resource statements, but didn't help either and now I'm getting some errors with Hadoop that I couldn't understand so I did ask a [new question](https://stackoverflow.com/questions/45990580/what-this-hadoop-error-means-status-failed-error-instance) to see what causes this error. – S. Oued Sep 01 '17 at 01:00