0

TL;DR

How can you find "unreachable keys" in a key/value store with a large amount of data?

Background

In comparison to relational database that provide ACID guarantees, NoSQL key/value databases provide fewer guarantees in order to handle "big data". For example, they only provide atomicity in the context of a single key/value pair, but they use techniques like distributed hash tables to "shard" the data across an arbitrarily large cluster of machines.

Keys are often unfriendly for humans. For example, a key for a blob of data representing an employee might be Employee:39045e87-6c00-47a4-a683-7aba4354c44a. The employee might also have a more human-friendly identifier, such as the username jdoe with which the employee signs in to the system. This username would be stored as a separate key/value pair, where the key might be EmployeeUsername:jdoe. The value for key EmployeeUsername:jdoe is typically either an array of strings containing the main key (think of it like a secondary index, which does not necessarily contain unique values) or a denormalised version of employee blob (perhaps aggregating data from other objects in order to improve query performance).

Problem

Now, given that key/value databases do not usually provide transactional guarantees, what happens when a process inserts the key Employee:39045e87-6c00-47a4-a683-7aba4354c44a (along with the serialized representation of the employee) but crashes before inserting the EmployeeUsername:jdoe key? The client does not know the key for the employee data - he or she only knows the username jdoe - so how to you find the Employee:39045e87-6c00-47a4-a683-7aba4354c44a key?

The only thing I can think of is to enumerate the keys in the key/value store and once you find the appropriate key, "resume" the indexing/denormalisation. I'm well aware of techniques like event sourcing, where an idempotent event handler could respond to the event (e.g., EmployeeRegistered) in order to recreate the username-to-employee-uuid secondary index, but using event sourcing over key/value store still requires enumeration of keys, which could degrade performance.

Analogy

The more experience I have in IT, the more I see the same problems being tackled in different scenarios. For example, Linux filesystems store both file and directory contents in "inodes". You can think of these as key/value pairs, where the key is an integer and the value is the file/directory contents. When writing a new file, the system creates an inode and fills it with data THEN modifies the parent directory to add the "filename-to-inode" mapping. If the system crashes after creating the file but before referencing it in the parent directory, your file "exists on disk" but is essentially unreadable. When the system comes back online, hopefully it will place this file into the "lost+found" directory (I imagine it does this by scanning the entire disk). There are plenty of other examples (such as domain name to IP address mappings in the DNS system), but I specifically want to know how the above problem is tackled in NoSQL key/value databases.

EDIT

I found this interesting article on manual secondary indexes but it doesn't "broken" or "dated" secondary indexes.

Community
  • 1
  • 1
magnus
  • 4,031
  • 7
  • 26
  • 48
  • For what are you using the NoSQL key/value database? For readmodels or as event store? – Constantin Galbenu Jun 20 '17 at 07:21
  • The wording in the question implied the write model, and although it mentioned event sourcing, it is more general in nature. E.g., how is this handled in key/value stores in general, regardless of whether event sourcing or "current state" is in use. – magnus Jun 20 '17 at 07:26
  • In event sourcing, every event store (as the write model persistence) that I've heard of guarantees atomicity per commit (commit = the events generated by a single command). So, in the case of event sourcing, your question does not apply, at least not when persisting a single aggregate (as your example is, the user is a single aggregate; the user Id and the username are props of the same aggregate and both are needed to keep the user in a consistent/valid state). Am I correct? – Constantin Galbenu Jun 20 '17 at 07:33
  • Also, when not using event sourcing, for write model persistence, in order to avoid what you are describing, transactions are used. So, the conclusion is that you cannot use a NoSQL key/value store as a write model persistence. – Constantin Galbenu Jun 20 '17 at 07:52
  • No, I think you've misunderstood. Forget event sourcing for a minute. The client registers, creating a KV pair "Employee:123456789" -> [BLOB]. The system crashes before inserting KV pair "Username:jdoe" -> "Employee:123456789". The employee data exists but is "unreachable". How do you deal with this? – magnus Jun 20 '17 at 08:52
  • You cannot have this situation, that's what I'm trying to tell you. Your aggregate's repository *must have* atomicity: when you persist an aggregate, all its properties must be persisted or none. If your key-value persistence implementation cannot do that then change it. Eric Evans propopse the use of NoSQL key-value store as a repository but only if you serialize the *entire* aggregate and store it as a value under the key 'Employee:123456789' – Constantin Galbenu Jun 20 '17 at 09:01
  • The [BLOB] part of the KV pair "Employee:123456789" in my previous comment **would** contain the username "jdoe". The KV pair "Username:jdoe" -> "Employee:123456789" is a (denormalised) secondary index. – magnus Jun 20 '17 at 09:33
  • And you are asking how to keep them on sync (the source of truth aka the repository and the secondary readmodel)? – Constantin Galbenu Jun 20 '17 at 09:49

2 Answers2

0

The solution I've come up with is to use a process manager (or "saga"), whose key contains the username. This also guarantees uniqueness across employees during registration. (Note that I'm using a key/value store with compare-and-swap (CAS) semantics for concurrency control.)

  1. Create an EmployeeRegistrationProcess with a key of EmployeeRegistrationProcess:jdoe.

    If a concurrency error occurs (i.e., the registration process already exists) then this is a duplicate username.

  2. When started, the EmployeeRegistrationProcess allocates an employee UUID. The EmployeeRegistrationProcess attempts to create an Employee object using this UUID (e.g., Employee:39045e87-6c00-47a4-a683-7aba4354c44a).

    If the system crashes after starting the EmployeeRegistrationProcess but before creating the Employee, we can still locate the "employee" (or more accurately, the employee regisration process) by the username "jdoe". We can then resume the "transaction".

    If there is a concurrency error (i.e., the Employee with the generated UUID already exists), the RegistrationProcess can flag itself as being "in error" or "for review" or whatever process we decide is best.

  3. After the Employee has successfully been created, the EmployeeRegistrationProcess creates the secondary index EmployeeUsernameToUuid:jdoe -> 39045e87-6c00-47a4-a683-7aba4354c44a.

    Again, if this fails, we can still locate the "employee" by the username "jdoe" and resume the transaction.

    And again, if there is a concurrency error (i.e., the EmployeeUsernameToUuid:jdoe key already exists), the EmployeeRegistrationProcess can take appropriate action.

  4. When both commands have succeeded (the creation of the Employee and the creation of the secondary index), the EmployeeRegistrationProcess can be deleted.

At all stages of the process, Employee (or EmployeeRegistrationProcess) is reachable via it's human-friendly identifier "jdoe". Event sourcing the EmployeeRegistrationProcess is optional.

Note that using a process manager can also help in enforcing uniqueness across usernames after registration. That is, we can create a EmployeeUsernameChangeProcess object with a key containing the new username. "Enforcing" uniqueness at either registration or username change hurts scalability, so the value identified by "EmployeeUsernameToUuid:jdoe" could be an array of employee UUIDs.

magnus
  • 4,031
  • 7
  • 26
  • 48
0

If to look at a question from the point of view of eventsourcing entities, then responsibility of an entity of EventStore includes the guaranteed saving an event into storage and sending for the bus. From this point of view it is guaranteed that the event will be written completely, and as the database in the append-only mode, there will never be a problem with a non-valid event.

At the same time of course it isn't guaranteed that all commands which generate events will be successfully executed - it is possible to guarantee only an order and protection against repeated execution of the same command, but not all transaction.

Further occurs as follows - the saga intercepts an original command, and tries to execute everything transaction. If any part of transaction comes to the end with an error, or for example, doesn't come to the end for the preset time - that process is rolled away by means of generation of the so-called compensating events. Such events can't delete an entity, however bring system to the consistent state similar to that the command never and was executed.

Note. If your specific implementation of the database for events is arranged so that the key value can guarantee record only of one couple, just serialize an event, and the combination from the identifier and the version of a root of the aggregate can be a key. The version of the aggregate in this case somewhat is a CAS operation analog.

About concurrency you can read this article: http://danielwhittaker.me/2014/09/29/handling-concurrency-issues-cqrs-event-sourced-system/

Vladislav Ihost
  • 2,127
  • 11
  • 26