Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

Question

I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate through all "dbpedia-owl:abstract". So I do something like this:

ExtendedIterator<Triple> iterator = Graph.find(Node.ANY, NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);

But then I try to iterate, memory consumption is increased, so looks like ExtendedIterator store found nodes. I use VisualVM profiler and found that while I iterate, count of com.hp.hpl.jena.graph.Node_URI is increasing. I try to do iterator.reset() but this takes no effect.

Can I iterate through all DBpedia abstracts without storing nodes?

Sorry for my bad english.

score 1 · Answer 1 · answered Jul 10 '15 at 15:37

1

Do you have to hold them all in a graph? You could handle the nodes as you parse them using RIOT using StreamRDF (or a convenient subclass). For example:

class MyHandler implements StreamRDF {
  ...
  public void triple(Triple triple) {
    if (triple.predicateMatches(DBpediaOWL.abstract)) {
      ... process ...
    }
  }
  ...
}
StreamRDF myHandler = new MyHandler();
RDFDataMgr.parse(myHandler, "dbpedia-file.nt");

answered Jul 10 '15 at 15:37

user205512

8,798
29
28

Thanks for your answer. Currently I hold all data from DBpedia dumps in Jena graphs for different kind of extraction(not only for extraction of "dbpedia-owl:abstract"). So solution that you purpose takes some time to change the way, of how I work with DBpedia data in common. So, for now, I'm looking for solution, that gives me ability, to iterate through all triples from previously created Jena tdb store without increasing consumption of memory that gc can't freed. – Yevhen Tienkaiev Jul 10 '15 at 20:54

Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

1 Answers1