Jena TDB , see how many triple stored during tdb creation

Question

Hi is possible to see the number of triple in storing during tdb creation with java api? I run the TDB factory with a rar file in turtle , but during the creation of files in my directory i cant see how many triple it has stored. How can i solve this problem?

It's not clear what you asking. If you query `select (count(*) as ?nTriples) { graph ?g { ?s ?p ?o } }` after setting things up, you'll get a count of the triples, or you could use the TDB model that you get, and ask the .size() of it, or… What have you tried and what didn't work? — Joshua Taylor, Jul 27 '14 at 22:10
The problem is in setting up, i want to see in real time how many triple is storing my procedure — user3329477, Jul 27 '14 at 22:15
The bulkloaders print out the triples/quads added (as they go along). If you are not using the bulkloader, as you say your using the API, then because it's transactional, you have to ask inside the transaction making the updates. — AndyS, Jul 28 '14 at 11:17

score 0 · Answer 1 · edited May 23 '17 at 12:12

You can access the bulk-loader through java code (to view triples introduced) as follows:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

If you have multiple files in your archive (for simplicity, I'll not do rar, but rather a zip), then as per an answer to this question, you can get optimized performance by concatenating the files into a single file prior to passing them to the bulk loader. The improved performance arises from delaying index creation until all triples have been introduced. I'm sure there are other formats that are supported, but I have only tested N-TRIPLES.

The following example utilizes IOUtils from commons-io for copying streams:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant

Jena TDB , see how many triple stored during tdb creation

1 Answers1

Linked