jena model.read time and memory issue

Question

I am trying to analyse large datadumps for my semantic web project which I created with eclipse/jena. I am using tdb database, which works fine for 2gb but I am experiencing memory issues with files over 6gb. My goal is to extract all predicates objects and subjects from a datasource (and write them into json files). Is there a way that I can directly query tdb data without loading all into a model ? Also : does model.read in the following code also store the entire data in memory ?

        HttpSession session = request.getSession();
        session.setAttribute("dataSource", data);   
        ServletContext servletContext = request.getServletContext();
        String tdbPath = servletContext.getRealPath("/tdb");
        File dir = new File(contextPath);
        Dataset dataset = TDBFactory.createDataset(tdbPath);
        Model model = dataset.getDefaultModel();
        InputStream str = FileManager.get().open(data);
        model.read(str,null);

score 1 · Answer 1 · answered Apr 25 '15 at 09:33

model.read in your example does not read everything into memory because it is backed by TDB.

Loading large-ish files in a servlet is not a very good user experience. Can you load the file ahead of time using the bulk loader?

You will need to make sure TDB has 1G (64 bit) or 2G (32 bit java) of heap for it's caches. TDB uses memory mapped files on 64bit as well as heap.

You can process RDF in a streaming fashion with RDFDataMgr.parse`.

jena model.read time and memory issue

1 Answers1