Traversing BerkleyDB database in store order

Question

When using cursors in BerkleyDB JE I found that traversing a dataset generate a lot of random read IO. It happens because BDB traverse dataset in primary key ascending order.

In my application I have not any requirements to process dataset in order (mathematically speaking, my operation is commutative) and I interested in maximizing throughput.

Is there any way to process dataset with cursor in store order and not in primary key order.

Adrian · Accepted Answer · 2011-11-28T11:14:44.203

I would guess not ; BDBJE is a log-structured database - ie, all writes are appended to the end of a log. This means that records are always appended to the last log, and may supercede records in previous logs. Because BDBJE cannot by design write to old logs, it cannot mark old records as superceded, so you cannot walk forward through the storage processing records because you are unaware of whether the record is current without having processed records from later in the log.

BDBJE will clean old logs as their "live" record count diminishes by copying the live records forward into new logs and deleting the old files, which shuffles the ordering yet more.

I found the Java binding of Kyoto Cabinet to be faster than BDB for raw insert performance, and you have a choice of storage formats, which may allow you to optimize your cursor-ordered record traverse performance. The license is similar (Kyoto Cabinet is GPL3, BDB is the Oracle BDB License (copyleft)) unless you pay for a commercial license in either case.

Update : As of version 5.0.34, BDBJE includes the DiskOrderedCursor class which addresses the required use case - it traverses records in log sequence, which in an unfragmented log file should be the same as disk order.

Since writing this comment, Kyoto cabinet has begun shipping a FOSSEXCEPTION file with it's sources that permits shipping of "Kyoto Products" with other OSS software under GPL incompatible licenses. This makes its licensing less restrictive than that of Berkeley DB JE. The exception only ships with the main C++ sources at present, but "Kyoto Products" would seem to imply the various language bindings as well. — Adrian, Oct 14 '11 at 19:01

score 0 · Answer 2 · answered May 10 '11 at 20:10

0

There are new "bulk-access" interfaces available that allow one to read multiple presumably contiguous records into a buffer using using either of the Db#get() or Dbc#get() methods in concert with the DB_MULTIPLE flag.

That documentation is for version 4.2.52, and I had some trouble finding documentation for the com.sleepycat.db package on Oracle's site. Here I found the documentation for version 4.8.30, but the classes Db and Dbc are not mentioned there.

Ah, classes MultipleEntry and MultipleDataEntry look to be promising equivalents to the use of DB_MULTIPLE above. The idea is that when you fetch data using, say, MultipleDataEntry with a suitably-sized buffer, you'll get back a whole bunch of records together that can then be picked apart using MultipleDataEntry#next().

I get the impression that this part of the interface has been in flux. As I don't have a fresh enough version of the library available on my project, I can't claim to have used these bulk-fetching interfaces yet. Please report back if you're able to investigate their use.

answered May 10 '11 at 20:10

seh

14,999
2
48
58

This is because we talking about slightly different database implementations. You mean BDB as a Java binding to a C based BerkleyDB, and a mean written on Java BerkleyDB JE. They have some difference in API. – Denis Bazhenov May 10 '11 at 21:23
Are you sure about that? It looks to me like the package (re)naming just takes the "Java Edition" qualifier out. I've been working with the JE library for a few years, so I understand what you're looking for, but it looks to me like the documentation I referenced is a rather fresh look at the JE product's evolution. – seh May 10 '11 at 22:04
Yes, I'm pretty sure about that. By the way link you provide is the part of BerkleyDB library documentation which notes that Java API is just frontend to the native C library: http://download.oracle.com/docs/cd/E17275_01/html/programmer_reference/java.html. – Denis Bazhenov May 11 '11 at 12:42
Quote from documentation: "The Berkeley DB Java classes are mostly implemented in native methods. Before you can use them, you need to make sure that the DLL or shared library containing the native methods can be found by your Java runtime" – Denis Bazhenov May 11 '11 at 12:43
Thanks for the clarification. Given that, one would expect that there should be an equivalent capability in JE, but so far I have not been able to find anything like it. – seh May 11 '11 at 13:12

Traversing BerkleyDB database in store order

2 Answers2