27

I'm looking a way to automate schema migration for such databases like MongoDB or CouchDB.

Preferably, this instument should be written in python, but any other language is ok.

Community
  • 1
  • 1
Alexander Artemenko
  • 21,378
  • 8
  • 39
  • 36
  • The question is how one emulates relational features in NoSQL? For example, what's the right way to do many-to-many relations in key-value storage? Or constraints? Welcome to SO, BTW :-) – sastanin Dec 27 '09 at 17:41
  • 2
    No. I mean schema migration. How to migrate from one document version to another (rename fields, etc.). – Alexander Artemenko Dec 28 '09 at 09:13
  • For mongodb, there's a python package called mongodb-migrations that allows for schema-style migrations. https://pypi.org/project/mongodb-migrations/ This was useful for me when we changed a property in one type of document from a String to an Array. – Caleb Jay Aug 22 '23 at 10:25

4 Answers4

20

Since a nosql database can contain huge amounts of data you can not migrate it in the regular rdbms sence. Actually you can't do it for rdbms as well as soon as your data passes some size threshold. It is impractical to bring your site down for a day to add a field to an existing table, and so with rdbms you end up doing ugly patches like adding new tables just for the field and doing joins to get to the data. In nosql world you can do several things.

  • As others suggested you can write your code so that it will handle different 'versions' of the possible schema. this is usually simpler then it looks. Many kinds of schema changes are trivial to code around. for example if you want to add a new field to the schema, you just add it to all new records and it will be empty on the all old records (you will not get "field doesn't exist" errors or anything ;). if you need a 'default' value for the field in the old records it is too trivially done in code.
  • Another option and actually the only sane option going forward with non-trivial schema changes like field renames and structural changes is to store schema_version in EACH record, and to have code to migrate data from any version to the next on READ. i.e. if your current schema version is 10 and you read a record from the database with the version of 7, then your db layer should call migrate_8, migrate_9, and migrate_10. This way the data that is accessed will be gradually migrated to the new version. and if it is not accessed, then who cares which version is it;)
Vitaly Kushner
  • 9,247
  • 8
  • 33
  • 41
2

One of the supposed benefits of these databases is that they are schemaless, and therefore don't need schema migration tools. Instead, you write your data handling code to deal with the variety of data stored in the db.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 5
    It is hard to write code to handle all versions of the documents. Code evolves and database should evolve too. Such databases are not schemaless, they are schema free. And this mean that you could have some document structures but there are no strong restrictions. – Alexander Artemenko Dec 25 '09 at 16:18
  • 2
    I think that for NoSQL databases we've to have "data migration" tools, rather then "schema migration" tools. If there isn't any, then I'll write one myself. – Alexander Artemenko Dec 25 '09 at 16:20
  • I'm not sure what the distinction is between "schemaless" and "schema free". In any case, one advantage of these databases is that you don't have to update all of the data when the schema changes. You could, for example, update each record/document as it is read and discovered to be in an old format. If you don't find any tools that do what you want, you are either blazing a new trail, or not understanding the NoSQL culture. – Ned Batchelder Dec 26 '09 at 15:25
  • 2
    Ok. To update data to a new version I need a tool anyway. In my opinion, it is more convinient then to have code which will work with all versions of the the documents. Are you really don't understand difference between schemaless and schema-free? :-) – Alexander Artemenko Dec 27 '09 at 13:07
  • 5
    This is not a constructive answer. Alexander is asking for a tool, he is not expecting someone to explain why he possibly doesn't need such tool, you are not aware of the reasons why he actually need it, although the database is schema[less/free]. It is useful to simplify the code by avoiding to manage multiple versions of your data, thus migrating the data when the structure of a collection changes. – Romain G Feb 24 '16 at 11:00
2

If your data are sufficiently big, you will probably find that you cannot EVER migrate the data, or that it is not beneficial to do so. This means that when you do a schema change, the code needs to continue to be backwards compatible with the old formats forever.

Of course if your data "age" and eventually expire anyway, this can do schema migration for you - simply change the format for newly added data, then wait for all data in the old format to expire - you can then retire the backward-compatibility code.

MarkR
  • 62,604
  • 14
  • 116
  • 151
1

When a project has a need for a schema migration in regards to a NoSQL database makes me think that you are still thinking in a Relational database manner, but using a NoSQL database.

If anybody is going to start working with NoSQL databases, you need to realize that most of the 'rules' for a RDBMS (i.e. MySQL) need to go out the window too. Things like strict schemas, normalization, using many relationships between objects. NoSQL exists to solve problems that don't need all the extra 'features' provided by a RDBMS.

I would urge you to write your code in a manner that doesn't expect or need a hard schema for your NoSQL database - you should support an old schema and convert a document record on the fly when you access if if you really want more schema fields on that record.

Please keep in mind that NoSQL storage works best when you think and design differently compared to when using a RDBMS

Astra
  • 10,735
  • 3
  • 37
  • 41
  • This is not a solution. Thank you for you "interesting" IMHO. – Alexander Artemenko Jun 10 '10 at 11:20
  • 2
    No, this isn't a 'solution' - nor is the accepted answer as it is basically a 'you cannot do it' if you look at the answers in the same way. All I am trying to do is draw attention to the fact that one should really question themselves if they *really* need a hard schema on a NoSQL database. Schemas can cause problems at scale, which is one reason that NoSQL is a good scaling solution, they don't have hard schemas. – Astra Jun 10 '10 at 20:00
  • 4
    The fact that you use NoSQL database doesn't mean that you have to forget good practices learnt by using RDBMS for two decades. On the opposite, there is multiple tools to provide an application-level schema validation of the data. NoSQL make the bet of increasing speed using denormalization which was already used in RDBMS (and that's part of how NoSQL was invented), it doesn't mean that everything from RDBMS should be thrown away. This strongly depends on the application you are developping. – Romain G Feb 24 '16 at 11:03
  • This is actually the solution and also the correct answer. The question is wrong, and this answer demonstrates why. – enanone Dec 03 '21 at 08:02