14

I am part of a development team currently working with a database that does not have any kind of source control. We work with SQL Server 2008 R2 and have always managed the DB directly with SSMS. It now has ~340 tables and ~1600 stored procedures, plus a few triggers and views, so it is not a small DB.

My goal is to have the DB under version control, so I have been reading articles, like Scott Allen's series and many old SO related questions. But I am still unable to decide on how to proceed.

What I'm thinking of is to script the database schema in one file, then procedures, triggers and views in one file each. Then keep everything versioned under Mercurial. But of course, every member of the team can access SSMS and directly change the schema and procedures, with the possibility that any of us can forget to replicate those changes in the versioned files.

What better options are there? And, did I forget any element worth having source control of? My biggest concern is that most of the literature I found explains how to do version control when creating a new database, but not when it is already old and relatively big.

ederbf
  • 1,713
  • 1
  • 13
  • 18
  • There are some (commercial) tools out there that can manage schema information like what you describe and do a diff between a DB and some previously saved schema state... which makes it possible to detect changes done outside the proper way... – Yahia Nov 07 '12 at 17:08

5 Answers5

14

The General Process

We create a baseline for a particular version (say, v1.0). A baseline includes one complete schema creation script, as well an upgrade script from allowed previous versions, if any (more on that in a moment). So for v1.0, we'd have just one script:

baseline-v1.0.sql

From that baseline, we create incremental change scripts as we work from the previous baseline. These scripts are created in a way that they are reentrant, so that they can be run safely multiple times (where the first time only does any actual work; see the next paragraph on a suggestion how). We just create a file for each change script with the baseline name and a timestamp (which we call the version). So for example, say we create two change scripts after a baseline. We'd have the following files:

baseline-v1.0.sql (for creating new installations)
baseline-v1.0-201211071220.sql (created on Nov. 7, 2012 at 12:20 PM UTC)
baseline-v1.0-201211122019.sql (created on Nov. 12, 2012 at 8:00 PM UTC)

We create a schema_version table that has two columns: baseline and version. baseline is some label (such as v1.0 mentioned above), and version is just a timestamp of when the change script was created (we chose to do this because creating arbitrary version numbers created annoying administrative overhead, where a timestamp was easy to use). So before running the change script, we check to see if the change script has been applied yet, by querying for it by baseline and version. If it's already present, just return out of the script or whatever. Otherwise, apply the change and insert into the schema_version table to mark the change script completed.

Example change script:

-- Created by <developer> on Nov. 7, 2012 at 12:20 PM UTC
declare @schema_baseline varchar(10), @schema_version varchar(12)

set @schema_baseline = 'v1.0'
set @schema_version = '201211071210'

if exists (select 1 from schema_version where baseline = @schema_baseline and version = @schema_version = @schema_version) return 0

-- begin change script

-- place your schema changes here

-- end change script

insert into schema_version(@schema_baseline, @schema_version)

Now, when we actually install the software, we run the relevant baseline script. As we upgrade that version, we just apply the change scripts in order.

When we hit a significant milestone in our product development phase, we create a new baseline. So, we create a new baseline script (again, this is a snapshot of the DB as a baseline), plus an upgrade script from the previous baseline. So let's say we have a new baseline, v2.0, we'd have the following files:

baseline-v2.0.sql (for creating new installations)
baseline-v2.0-upgrade-v1.0.sql (for upgrading from v1.0)

Then the process continues.

How We Apply Changes

The scripts are all kept in source control. We do have a tool that packages these files and automatically upgrades databases, which our support and installation teams use. The tool figures out the current baseline of the target database, and asks the user if they wish to upgrade to the baseline in the package. If they do, and there is a valid upgrade path from the current version, it applies the upgrade script, and updates the schema_version.baseline, and deletes all entries for change scripts from the previous baseline. If the database is new, it applies the regular baseline script. Either way, after the baseline is achieved, it applies all change scripts from the baseline that are present in the package, one at a time, in order, in a transaction. If a particular change script fails, it rolls back the last set of changes and errors out. We look at the log, fix any issues, then rerun the package again. At that point, it should just pick up at the last change script that succeeded, saving time.

Automation and Diff Tools

We do not allow diff tools to upgrade production databases directly. It's just too risky. We do use diff tools, of course, to help create our upgrade and change scripts, but once we have them, we comb through them, massage them, test them, etc., then create the upgrade or change script according to the specs above. We do use tools/shell scripts to create the change script files and put the boiler plate schema_version checking.

Caveats

It's actually pretty straight-forward and it works well. The only time it really gets tricky is with branches. For the most part, branches are handled well. If we need a change script for a particular branch's work, it will fold into the mainline very well once we merge the branch back in. No problem. Where it gets tricky is when two branches try to do similar things, or where one branch relies on another. That's mostly a process and planning issue, though. If we get stuck in such a situation, we just create a new baseline (say v2.1), then update the branches accordingly.

Another thing to keep in mind is if an installation wants to be upgraded from one baseline to another, it has to apply all outstanding changes for the current baseline, before we upgrade to the new one. In other words, we don't let installations jump right from where ever they are to the next baseline (unless, of course, they're already at the most recent version for the current baseline).

moribvndvs
  • 42,191
  • 11
  • 135
  • 149
11

I would recommend SQL Server Data Tools and/or a Visual Studio SQL database project. It will reverse engineer your existing DB to code(sql) files that can be version controlled and gives many other niceties (publishing, comparison, etc)

StingyJack
  • 19,041
  • 10
  • 63
  • 122
5

We developed SQL Source Control specifically to solve the problem you describe. It extends SSMS to provide a link between your SQL Server schema objects (and static data) and your existing source control system.

http://www.red-gate.com/products/sql-development/sql-source-control/

If you need any more information, we'd be very pleased to help (contact support@red-gate.com)

David Atkinson
  • 5,759
  • 2
  • 28
  • 35
  • I have a question. Supposing there are 2 developers. Developer A has SSMS extended with red-gate but Developer B does not. If developer B makes changes with his SSMS, will red-gate be able to still track the changes made by Developer B? – Harvey Darvey Nov 18 '15 at 16:45
  • Yes, this works. Changes are detected by querying the database, not anything proprietary in the tool. – David Atkinson Nov 20 '15 at 18:33
3

There have been many discussions regarding this topic on many developer forums.

What I have done and found to be the simplest and cleanest way to do is this:

  1. Extract every DB object's DDL into its own file, indexes and PKs can go in the same file as the table they belong to. FKs, procedures, views, triggers, anything that can go across multiple tables go in their own file.

  2. Organize the DDL files in dirs per object type (e.g. table, procedure, trigger, view etc.)

  3. For tables holding static reference data (e.g. zip code or state), have a separate file with a bunch of insert statements

  4. Check this directory structure into whatever version control you are using

  5. Write a script that will traverse this directory structure that images your DB, diff it against the actual DB you point to (extracting the schema from the system tables) and apply the diffs using ALTER TABLE statements

  6. In case you have data transformations between releases, e.g. in v1 you had a field FirstAndLastName and in v2 you decided to split it into FirstName and LastName, you will have some bulk data migration/processing statement.

I have successfully managed DB changes in several jobs using several different RDBMSs. I usually use Perl for the script that diffs the DB schema and the DDL files in your image. There are some assumptions to this method and one of them is that you never make any changes to the DB directly in the DB but in your DDL files and then run the script to apply it. If you do it the other way, they will be undone when you run the script. So it requires some team agreement and discipline. Your milage may vary.

Now, if there is a FOSS tool out there that will do this for you, by all means use that rather than devising your own. I've been doing things this way for more than 10 yrs

amphibient
  • 29,770
  • 54
  • 146
  • 240
-1

Our Sql Historian source control system can help folks with this problem, especially in the situation you mention where teammates"forget" to check in code after they've updated the server.

It sits in the background and records all changes made to your db objects into source control, without users needing to check anything in. Think of it like an airplane blackbox recorder, staying out of the way until you need it.

jlee-tessik
  • 1,510
  • 12
  • 16