Is starting with a dump necessary for source controlling a database driven application?

Question

What are the pros and cons of using a dump-file as a basis of data and schema migration, as opposed to fully script based or a database delta tool?

The context is that the application is in production, and there is only one production database. The application and database schema are in active development. Critical user data exists in the production database and must be rolled forward with deployment of new versions or fixes.

The solutions being discussed are :

Dump file basis -

Start with a reference point dump file.
Database alter scripts are checked into source control.
Deployment entails loading the dump file and then running the alter scripts

Schema + migration

Entire schema and certain non-user configuration data are stored as DDLs and DMLs in SCM.
Migrations scripts against the latest release's schema are stored in SCM.
Deployment entails loading the schema then migrating the data. 3.

My intuition is that using a binary format as the basis is bad, but I need to be able to convince others (if that is indeed the case), who argue that it is necessary.

I re-formulated this question to make it easier to answer.

Below is the original question:

I am working with a team on a database driven enterprise application and looking to improve our process. Currently, there is a lot of manual process around updating the database on all tiers. The goal is having automated process for updating the database consistently and in an automated fashion, (in line with the idea of atomic commits and closer towards continuous delivery), which would bring numerous advantages.

I think that the schema (and certain data necessary for application configuration) should be represented in source control, and in addition, any scripts necessary to transform and load user data from the current production database. I have read that it is not advisable to have a dump file (.dmp) in source control, and intuitively, I agree strongly. However I am not agreed with by everyone on the project. The counter argument is that in reality it is not possible, or at least is too difficult, to not start with a dump file. I am up against my limit of database knowledge and can't really debate meaningfully... I am more a developer and not a database specialist. The alternative suggested is that alter scripts be kept that alter the dump to the latest schema.

Can someone help me understand the pros and cons of each approach little better? Is the dump-based approach necessary, a good idea, or not a good idea, and why?

A little background that may be relevant: the application is in production so each new version must import data as a part of the deployment process, and for obvious reasons on integration and UAT tiers this should be real data. However this app is not "shipped" and installed by customers, there is only one instance of production at a given time, maintained internally. I realize there will be details specific to my project so the answer will have to address the general case.

One of your key issues will be 'how many different versions do you have to upgrade from' when you release an upgrade. If all the existing systems are (always) at the same revision level, then there are one set of options; if you have different production systems at different levels, then you have a more complex set of issues to deal with. — Jonathan Leffler, Jan 30 '12 at 17:50
In our case there will only be one production system at a given time, thus the production database instance contains the up to date data that must be migrated to a new instance or mutated to the new schema. — derekv, Jan 30 '12 at 18:47
The luxury...that most certainly makes life easier. You probably will want to have a test system as well - separate from the development system - so that you can verify any migration operations on something other than the development machine (where you've already done the migration N times) and the production machine (which you probably don't want to go offline because the migration fails). And, if you expand, you'll end up with multiple systems. But for now, you have the luxury of a relatively (emphasis on _relatively_) simple process. — Jonathan Leffler, Jan 30 '12 at 19:11
Yes *one* of the really important benefits of making it automated is that you are testing the deployment in addition to testing the code, in addition to reducing the differences between production and UAT and other testing instances. But the question is how to handle data migration and whether starting with a dump each time is bad and why. — derekv, Jan 30 '12 at 19:59
One problem with my question is that in reality it entangles multiple questions. — derekv, Feb 02 '12 at 18:38

score 1 · Answer 1 · answered Jan 30 '12 at 17:42

1

Most projects I've been on had explicit SQL scripts for schema creation, and initial data insertion. (and upgrade scripts with alter statements and data migrations)

Also there are things like Liquibase for managing database changes.

http://www.liquibase.org/bestpractices

answered Jan 30 '12 at 17:42

Gus

6,719
6
37
58

How has the data migration worked typically? Is it an update of a live database, update against some dump file or backup, or a import across two instances? I'm trying to move our project towards automatic tier deployments ala continuous delivery. – derekv Feb 01 '12 at 15:49

score 1 · Accepted Answer · answered Feb 11 '12 at 03:22

A lot of bad stuff arises from having different scripts for fresh install and upgrade. I worked for the Oracle E-Business Suite in the early 2000's and the adpatch tool in my experience eliminated that fatal variation.

A key technique I absolutely loathed, after Oracle acquired my employer, was insisting that all scripts be completely re-runnable without errors - and runnable with no errors at all. Once we got our patch quality up to snuff I realized it was totally genius.

Another key technique I learned was having strong database comparison/verification scripts.

If your schema is in good shape, your datasets will most easily look after themselves.

Is starting with a dump necessary for source controlling a database driven application?

2 Answers2