3

I'm thinking about using Berkeley DB as part of the backend for a highly concurrent mobile application. For my application, using Queue's for their record level locking would be ideal. However, as stated in the title, I need to query and update data that would be conceptually modeled like Map<Number,Map<Number,Number>>.

The outer key would reference a unique Item, and the inner key would reference one of that Item's metrics. The inner value would be a counter that I need to atomically increment, possibly very frequently. Hence, why record level locking is a desirable feature here. Ideally, record level would be analogous to Item level in the data model.

The data would be used in the following two ways:

  1. Add <Number,Map<Number,Number>> entry

    • Relatively infrequent
  2. Batch incrementations of ~15 metrics atomically, in database, given an Item id and a list of metric ids

    Then, get that Item's metric map

    • Very frequent

The inner Map should be able to grow, but it would not get larger than 200 entries.

And that's it.

Do you think Berkeley DB would be suitable for this application of it?

Update:

Apparently, the schema of my data isn't clear enough so I'm going to break it down further.

An Item, has many metrics, which each have one counter, i.e. one-to-(many-to-one) i.e. <Number,Map<Number,Number>>

But I have many Item's, so what I need is a Map<Number,Map<Number,Number>>

the beest
  • 463
  • 6
  • 26
  • Can you provide an example table (?) to explicit the schema of you data, please? What problem are you trying to solve, I can't think of a problem solved by Map> – amirouche Jun 27 '16 at 20:13
  • Is the key Number of Map immutable? – amirouche Jun 27 '16 at 20:18
  • Are Number keys (inner and outer) already provided, or you need them to be generated by the database? – amirouche Jun 27 '16 at 20:19
  • Why do you need level record locking? Aren't ACID transactions enough? Is GPL an option? – amirouche Jun 27 '16 at 20:21
  • Do you need multiprocessing and/or multithreading? – amirouche Jun 27 '16 at 20:22
  • @amirouche the key Number of Map is immutable – the beest Jun 27 '16 at 20:47
  • @amirouche Number keys are already provided – the beest Jun 27 '16 at 20:48
  • @amirouche i would think i need record level locking because a counter can't be incremented by two processes at the same time. if ACID transactions cover that possibility, than they would be enough. don't know what GPL is – the beest Jun 27 '16 at 21:00
  • @amirouche multiple requests will be made to the database at the same time, so being able to run multiple processes would be ideal. all the queries i'd be running are listed above. neither should take much time, so if i understand the difference between multithreading and multiprocessing correctly, multithreading shouldn't be necessary in this case – the beest Jun 27 '16 at 21:40
  • @amirouche check the update for clarification on the schema of the data – the beest Jun 27 '16 at 21:40

1 Answers1

0

I think Berkeley DB would be a fine choice, with some caveats about how you choose to lay out your data. But, you may want to consider other key-value stores too - LMDB, for example, should prove much easier to get going with than BDB.

At first blush, it seems like a record (the "value" in "key/value") in your system could be your inner Map<Number, Number>. The queue access method (or btree, FWIW) provides the outer Map<Number, Record>.

Berkeley DB doesn't provide much (really any) assistance for accessing stuff inside the record. So, you'd still have to represent your inner Map in some way that allows random access and modification of its contents. Depending on how big the first Number is in Map<Number, Number>, you could do a simple C-style array. You could use a JSON object, or a protobuf or, well, anything else you can think of.

This layout only makes sense if you have many, many entries in the outer Map>, compared to the 200 or so entries you mentioned for the size of the inner map. The record level locking applies to the whole inner Map, because that's your record.

Another technique is to create a composite key out of the first two Numbers in your schema. That is, Map<NumberX, Map<NumberY, NumberZ> becomes a database with key NumberX_NumberY and value NumberZ. This would give you fast random access to any particular entry in your inner map, but you'd have to use a cursor to retrieve all the entries in the whole inner map.

Mike Andrews
  • 3,045
  • 18
  • 28
  • it doesn't seem like Berkeley DB permits operations on the value part of a record -- [Oracle link](https://docs.oracle.com/cd/E17275_01/html/programmer_reference/intro_dbisnot.html). in my use case, batch incrementations of metrics would be occuring very frequently and concurrently. do you still think Berkeley DB or LMDB would be an appropriate solution? – the beest Jun 29 '16 at 18:26
  • Well, concurrently on the same "record" would be a problem. You could consider splitting the data up as I suggested there at the end, by making a record for each individual metric, and having a key that was a composite of your two Map keys (e.g. by shifting one of the Numbers over N bits, then or-ing it with the other Number). You'll need to use a cursor at the very least to retrieve batches of metrics. – Mike Andrews Jun 29 '16 at 21:57
  • FWIW, I think concurrent modifications to any single record will be a problem for nearly any database. With an embedded database, you've got some control over how you update the record. But, there's no magic... you'll either have to deal with transactions and retrying on conflicting updates, or locking, or logging, or all of the above. – Mike Andrews Jun 29 '16 at 22:06
  • I'm not sure how you're using "records". Are you referring to concurrent updates to counters, or concurrent updates to a particular `Item`. In any case, the problem is easily solved by locking the record -- an `Item` -- as it's being updated; which is not an uncommon feature in databases. Berkeley DB queues have it, but the problem is they don't support in database updates to the value or `Map` datatypes, so I'd have to use your composite approach to organize my data. And updates would involve reading from the database, modifying in application, and writing to the database. – the beest Jun 30 '16 at 14:47
  • Which is two more steps than I'd like -- native support for incrementation updates is what I'm looking for. So now I'm leanining towards `Aerospike` which is a NoSQL store that will let me store my `Item`'s as a record (row) with a variable number of bins (columns) -- which is basically an inner-map -- , it supports locking at the bin level, and natively supports incrementing a bin. Seems extremely fitting for my application. – the beest Jun 30 '16 at 14:56
  • 1
    Thanks for your help thus far. Unless you have any more significant pros for `Berkeley DB` or significant cons against `Aerospike`, it looks like I'll be going with `Aerospike`. – the beest Jun 30 '16 at 14:57
  • Aerospike looks quite good! And, you're totally right, that solution they have covers your inner-map requirement quite nicely. Good find! – Mike Andrews Jun 30 '16 at 16:01