If you use schemaless database (particularly document-oriented databases like CouchDB, Couchbase, MongoDB) and want to change format of data representation for a particular object you may leave existing records with old format and create new records in new format. It's declared as one of major advantages of schemaless databases (I think because you can avoid downtime). On the other hand it's inconvenient and inefficient to deal with many formats of the same kind of data. So what are the good approaches/strategies to migrate data from one format to another in schemaless databases?
1 Answers
Like everything there are many different ways to handle this. In schemaless development, you generally are cognizant of the data you are storing. It's not that the schema is missing, all data has an implicit schema, so what we are really saying is that the database is not enforcing a schema. If I have a user object with 10 instance variables that I store in json, there IS a schema there!
Case 1: values might have different possibilities, single value, array, or a nested structure
Case 2: value needs to be changed from one format to another, ex. from single value to array of values
Case 3: existence or non-existence of a json key, this is pretty straightforward
For Case 1: if you are expecting variety in a json value, the variety of a particular value will need to be written into your App Code logic, if it's a string, do this, if it's an array, do that.
For Case 2: One approach can be to handle this as an "On Request" or "On Demand" so that you bake in the transformation logic into your class methods, so that data is transformed from one format to another format. This means that you transform data from one format to another when it is retrieved. You can also flag it to indicate you have transformed it. Since it's On Demand, you could have data that isn't "transformed" in your document store, but if it does get requested, it'll be transformed.
Alternative approach for Case 2: Iterate through and transform the data through worker processes. So rather than wait for it to be requested, you actually create a job to change data as you want it to be changed, baking in the transformation logic into the workers themselves (which can use the same class definitions in your App Code). In Couchbase you can create a View (Secondary Index) or use Elastic Search to iterate through documents of a particular type. If you create a workflow system, you can do a lot of this in parallel with many workers.
>>>> When I do transformations I generally transform one json k/v into another json k/v in a non-destructive way so that if I have made an error in my process, I do not alter original data. I can then have a later phase to remove old json k/v "On Demand", if I even feel that is necessary. This is a safer approach to this type of operation.
Appended
Case 1 & 2: Data Transformation
Original JSON Document
user::101
{
"uid": 1234,
"type": user,
"my_comment": "the quick brown fox jumped over the lazy dog"
"version": 1.00
}
Now let's say I want to change it in a non-destructive way, I can easily just add a new json key that has the transformed data:
user::101
{
"uid": 1234,
"type": user,
"my_new_comment": ["the quick brown fox jumped over the lazy dog", "comment2"]
"my_comment": "the quick brown fox jumped over the lazy dog",
"version": 1.01
}
Notice it's non-destructive, the old json key is still there, alternatively I can do this, save the old data as a new key, and change the expected json key to a new format (array) instead of a string:
user::101
{
"uid": 1234,
"type": user,
"my_comment": ["the quick brown fox jumped over the lazy dog", "comment2"],
"my_comment_v1.00": "the quick brown fox jumped over the lazy dog",
"version": 1.01
}
Obviously there are quite a variety of different schemes you could use, depending on your preferences.

- 1,273
- 6
- 7
-
Could you give an example, how you the migration in a non-destructive way? Basically you want to replace k/v1 with k/v2. As far as I can understand, this means that k/v1 should be 'destroyed'. – Alexey Nov 21 '12 at 15:22
-
Basically from your answer I see one of a possible migration procedures as following. When I change a format of data 1) app code should be able to work with old and new format and distinguish one from another 2) app code that uses the data should be able to convert on the fly the old format to new and work with new 3) worker process is started that convert records in old format to new format and save them – Alexey Nov 21 '12 at 15:30
-
let's see if I can do this in a comment on SO... let's say the document is: ah, can't do return chars, let me create a gist... I'll edit my answer to help understand and throw in a gist. – scalabl3 Nov 21 '12 at 19:25
-
didn't need a gist, just put it into the answer, make sense? if not, please feel free to ask more questions! – scalabl3 Nov 21 '12 at 19:37