20

I'm exploring Mongo as an alternative to relational databases but I'm running into a problem with the concept of schemaless collections.

In theory it sounds great, but as soon as you tie a model to a collection, the model becomes your defacto schema. You can no longer just add or remove fields from your model and expect it to continue to work. I see the same problems here managing changes as you have with a relational database in that you need some sort of script to migrate from one version of the database schema to the other.

Am I approaching this from the wrong angle? What approaches do members here take to ensure that their collection items stay in sync with their domain model when making updates to their domain model?

Edit: It's worth noting that these problems obviously exist in relational databases as well, but I'm asking specifically for strategies in mitigating the problem using schemaless databases and more specifically Mongo. Thanks!

Community
  • 1
  • 1
Timothy Strimple
  • 22,920
  • 6
  • 69
  • 76
  • 2
    Good question. I think this is more of an issue with strongly typed languages like C#, where any change to a model would break the mapping between existing documents and the model class. In more dynamic languages like Python, you can treat each document more as a "property bag" that truly has no schema. Then you can test for existence of certain attributes (which can be added or removed at any time). Perhaps using C#'s `dynamic` keyword and `ExpandoObject`s is the way to go. Obviously, your code will still need to handle changes made in the schema as time goes by, but it'll be more flexible – Cameron Sep 02 '11 at 06:02
  • Excellent idea. I've sort of filed the dynamic keyword away as "used for duck typing" but I think there is some very interesting applications here. After a little searching: (http://www.scottw.com/mongodb-dynamics) I'll look into this some more. – Timothy Strimple Sep 02 '11 at 06:09
  • 2
    This is why I love stackoverflow. 13 views and already two great suggestions! – Timothy Strimple Sep 02 '11 at 06:20

4 Answers4

14

Schema migration with MongoDB is actually a lot less painful than with, say, SQL server.

Adding a new field is easy, old records will come in with it set to null or you can use attributes to control the default value [BsonDefaultValue("abc", SerializeDefaultValue = false)]

The [BsonIgnoreIfNull] attribute is also handy for omitting objects that are null from the document when it is serialized.

Removing a field is fairly easy too, you can use [BSonExtraElements] (see docs) to collect them up and preserve them or you can use [BsonIgnoreExtraElements] to simply throw them away.

With these in place there really is no need to go convert every record to the new schema, you can do it lazily as needed when records are updated, or slowly in the background.


PS, since you are also interested in using dynamic with Mongo, here's an experiment I tried along those lines. And here's an updated post with a complete serializer and deserializer for dynamic objects.

MPelletier
  • 16,256
  • 15
  • 86
  • 137
Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
  • Thanks for the suggestions. I'd prefer not to litter my domain model with Mongo specific attributes, but it appears I can also use this functionality in the ClassMap. I'll look into this some more. Things are more painful right now because I'm early in development and my domain is a lot more volatile than it normally would be. At this point I'm just dropping collections as it's easier than dealing with the "schema" mismatch. – Timothy Strimple Sep 02 '11 at 06:16
  • That's correct - you can either use the attributes, or if you prefer to keep your classes 'pure POCO' the driver also has class maps. – Chris Fulstow Sep 02 '11 at 06:42
  • The class maps have been invaluable so definitely look at them. The ability to register key\ID generators at the driver level and not having to mark up your model at all is also invaluable if you don't want to be tied to string\guid keys. – Vassi Sep 09 '11 at 19:55
  • @TimothyStrimple I shared your symphaty about not wanting to litter my domain model with Mongo specific attributes. I struggle with this a lot in the end I gave up because the mongo specific attributes is what making it very flexibly and easy to use. – atbebtg Oct 06 '11 at 18:48
  • 1
    @atbebtg I was able to accomplish this without using Mongo attributes on my domain objects. I used BsonClassMap functionality to add the details in the repository. http://www.mongodb.org/display/DOCS/CSharp+Driver+Serialization+Tutorial#CSharpDriverSerializationTutorial-Creatingaclassmap – Timothy Strimple Oct 06 '11 at 18:55
  • I guess you have to add the field in for every document if your searching against that field? – Donny V. Jan 21 '12 at 02:53
0

My current thought on this is to use the same sort of implementation I would using a relational database. Have a database version collection which stores the current version of the database.

My repositories would have a minimum required version which they require to accurately serialize and deserialize the collections items. If the current db version is lower than the required version, I just throw an exception. Then use migrations which would do all the conversion necessary to update the collections to the required state to be deserialized and update the database version number.

Timothy Strimple
  • 22,920
  • 6
  • 69
  • 76
0

With statically-typed languages like C#, whenever an object gets serialised somewhere, then its original class changes, then it's deserialised back into the new class, you're probably going to run in problems somewhere along the line. It's fairly unavoidable whether it's MongoDB, WCF, XmlSerializer or whatever.

You've usually got some flexibility with serialization options, for example with Mongo you can change a class property name but still have its value map to the same field name (e.g. using the BsonElement attribute). Or you can tell the deserializer to ignore Mongo fields that don't have a corresponding class property, using the BsonIgnoreExtraElements attribute, so deleting a property won't cause an exception when the old field is loaded from Mongo.

Overall though, for any structural schema changes you'll probably need to reload the data or run a migration script. The other alternative is to use C# dynamic variables, although that doesn't really solve the underlying problem, you'll just get fewer serialization errors.

Chris Fulstow
  • 41,170
  • 10
  • 86
  • 110
0

I've been using mongodb for a little over a year now though not for very large projects. I use hugo's csmongo or the fork here. I like the dynamic approach it introduces. This is especially useful for projects where the database structure is volatile.

ritcoder
  • 3,274
  • 10
  • 42
  • 62