Hi is possible to see the number of triple in storing during tdb creation with java api? I run the TDB factory with a rar file in turtle , but during the creation of files in my directory i cant see how many triple it has stored. How can i solve this problem?
Asked
Active
Viewed 356 times
0
-
It's not clear what you asking. If you query `select (count(*) as ?nTriples) { graph ?g { ?s ?p ?o } }` after setting things up, you'll get a count of the triples, or you could use the TDB model that you get, and ask the .size() of it, orβ¦ What have you tried and what didn't work? β Joshua Taylor Jul 27 '14 at 22:10
-
The problem is in setting up, i want to see in real time how many triple is storing my procedure β user3329477 Jul 27 '14 at 22:15
-
2The bulkloaders print out the triples/quads added (as they go along). If you are not using the bulkloader, as you say your using the API, then because it's transactional, you have to ask inside the transaction making the updates. β AndyS Jul 28 '14 at 11:17
1 Answers
0
You can access the bulk-loader through java code (to view triples introduced) as follows:
final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}
If you have multiple files in your archive (for simplicity, I'll not do rar, but rather a zip), then as per an answer to this question, you can get optimized performance by concatenating the files into a single file prior to passing them to the bulk loader. The improved performance arises from delaying index creation until all triples have been introduced. I'm sure there are other formats that are supported, but I have only tested N-TRIPLES
.
The following example utilizes IOUtils
from commons-io
for copying streams:
final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);
final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
@Override
public Long call() throws Exception {
long filesLoaded = 0;
try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
while( zipEntries.hasMoreElements() ) {
final ZipEntry entry = zipEntries.nextElement();
try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
// If your file is in a supported format already
IOUtils.copy(singleIn, concatOut);
/*(final Model m = ModelFactory.createDefaultModel();
m.read(singleIn, null, "lang");
m.write(concatOut, "N-TRIPLES");*/
}
filesLoaded++;
}
}
concatOut.close();
return filesLoaded;
}});
final Future<Void> comitter = workers.submit(new Callable<Void>(){
@Override
public Void call() throws Exception {
TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
return null;
}});
workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant