1

In Jena's TDB, it seems that data is organized by a "Dataset" (specified by a directory) which can contain multiple "named graphs".

Regarding the concurrency policy to query such data, the only documentation I found related to concurrency is the following sentence from the TDB documentation, TDB Java API:

It is possible to act directly on the dataset without transaction with a Multiple Reader or Single Writer (MRSW) policy for concurrency access.

However, I'm not sure about the granularity about such MRSW policy. Is it on the whole Dataset, or an individual named graph within Dataset?

EDIT: More specifically, my requirement is that I want to do write-only updates to different named graphs (each thread writing to a different named graph) without any read operation, would it be possible? Or do I have to let one thread update at a time.

zack
  • 335
  • 3
  • 13
  • I can't say (because I don't know) whether your update is the right answer or not, but you should post it as an answer, since it's more of an answer than an elaboration on the question. It's quite alright, even encouraged, to answer your own question on StackOverflow since, after all, you're the one who knows best what the most useful answer was. By posting it as an answer, it will be clearer to other users what worked for you, and there (if you also [accept it](http://meta.stackexchange.com/q/5234/225437)) will be one less question in the system without an accepted answer. – Joshua Taylor Sep 24 '13 at 14:21
  • agree, added my answer – zack Sep 24 '13 at 14:54

3 Answers3

2

Given that the linked documentation says

It is possible to act directly on the dataset without transaction with a Multiple Reader or Single Writer (MRSW) policy for concurrency access.

I expect that if you have more than one writer that will access the dataset, even if in different named graphs, that you should be using transactions. The documentation on TDB Transactions says about write transactions:

The general pattern is:

 dataset.begin(ReadWrite.WRITE) ;
 try {
   ...
   dataset.commit() ;
 } finally { 
   dataset.end() ; 
 }

and those calls to begin and end are associated with the dataset, not individual named graphs.

Many triple stores (and I think TDB is included in this) treat triples in named graphs as quadruples (often just called quads). A triple a b c in a named graph g1 could be stored alongside a triple d e f in a named graph g2 in the same quad-table:

g1 a b c
g2 d e f

and then this quad-table, which represents a single dataset, can be indexed on any of the four columns. In this representation, the named graph part of the data isn't really any different than the rest of the data, so named graphs don't provide any insulation from concurrency issues. Indeed, since, in general, SPARQL queries and updates could read from or update multiple named graphs, there's no way to know in advance named graphs a query or update will touch.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thanks for the answer. I should clarify the question that if I want to do write-only updates to different named graphs (each thread writing to a different named graph), would it be possible? Or do I have to let one thread update at a time. – zack Sep 24 '13 at 00:02
  • Apparently you can do Dataset dataset = TDBFactory.createDataset("demo"); Model model = dataset.getNamedModel("aModel"); try { model.enterCriticalSection(false); //Write Lock ... model.commit(); TDB.sync(model); } finally { model.leaveCriticalSection(); } Sorry for the bad formatting. I updated this in my own answer – zack Sep 24 '13 at 00:17
1

Apparently one can write the following code:

OK. Apparently one can write the following code:

Dataset dataset = TDBFactory.createDataset("demo");
Model model = dataset.getNamedModel("aModel");
try {
    model.enterCriticalSection(False);   //Write Lock

    // write triples to model

    model.commit();
    TDB.sync(model);
} finally {
    model.leaveCriticalSection();
}

According to this, I think there should not be any problem with writing to different named graphs concurrently. This is still not tested though.

zack
  • 335
  • 3
  • 13
1

It is not safe to write to two graphs in the same dataset at the same time.

It may seem to work without transactions but it is potentially unsafe. The code is likely to detect this and warn but it is guaranteed.

You should use transactions.

When two writers try to write, there is no true parallel write (there is internal locking to keep everything safe).

If you want to emphasis write, consider having two datasets, then create a general purpose one (in-memory structure), with models from each separate dataset.

In practice, true parallel writers may not give you much advantage over write transactions to the same database if there is only one path to disk on conventional servers with one disk. CPU+RAM is not the limitation.

AndyS
  • 16,345
  • 17
  • 21