How can I keep order of refactors using database refactoring software?

Question

I have been trying to use either Liquibase or DBDeploy. I'm more drawn toward Liquibase because of its non-SQL interface (I.E. I can just use JSON or Yaml changesets). However, there is a problem I have with both of these software.

Liquibase Workflow

I create a master changelog. All it does is to includeAll a folder which contains small files with small changesets inside.
I then create changesets, and prefix them with a number (E.G. a timestamp, or a simple integer like 1, 2, etc).

DBDeploy Workflow

I just start making delta sql files, prefixed by the same strategy as number 2 above.

The problem

Well, the problem is so trivial I'm feeling stupid for asking it, but here it goes. Consider this scenario:

I make a branch to work on my feature, say, adding orders to the system.
My colleague, Bob, makes his own branch to add products to the system.
When it comes time to merge, there is no telling whose changesets or delta sqls will run first. This may break the database.

Doesn't this happen to anyone? If so, what's the way to solve it in the PHP land?

Thanks.

Jens · Answer 1 · 2014-05-12T17:34:26.170

(I only know liquibase - so my answer is valid only for liquibase)

As the includeAll doc states the files will be run in alphabetical order. So I'd expect that your numbering should be sufficient for the files to be in the right order. However you will have to synchronize those numbers between you and Bob anyways to find out what has to run first.

We don't use includeAll though. We include files into the master changelog manually. So whoever wants to change the database has to take care of including this into the master changelog. If there are two changes the developer that comes last has to make sure to include/merge his changes at the right place in the master change log.

EDITED - to explain the include mechanism we use

Within development we just flunk the whole database and let liquibase create the database from scratch whenever we change anything (on our development database). We always checkin the changeset file to our code repository. So with this we track everything that was done in development since we can checkout every version of the changeset and let liquibase create the database with it.

Only when development of the version we are working on (or the sprint) is done. Then we really let the changesets run und let them change the productive database.

Then this cycle is repeated again and again.

This way only the final db changes are tracked with liquibase and not all the tryouts that might take place during development.

During development you could end up changing a table multiple times because you try out different things. You add a column, then realize the whole thing will not work so you remove the column again and add another one. Why would you want to have all these changes in the databasechangelog ? Then it will fill up with unnecessary stuff - just like you were afraid of.

Hope this clears things up. But in the end you might have a total different development approach. So feel free to use liquibase just the way you need it.

There is no reason to fill the databasechangelog table with all the ideas during development. Only when we are done with development and release a version

Doesn't this make the changelog grow very, very big? We are an agile team with a small project in its first development phase, so that means the database is going to undergo a lot of changes, maybe sometimes even small bits of redesign. — Parham Doustdar, May 12 '14 at 13:04
@ParhamDoustdar You do not have to track every change within the development phase. We use it this way: Within development you can change the changeset as often as you like. E.g. you add a new table and in the same "sprint" you realize it needs more columns, then you would change the changeset that creates the table. Only at the end of a sprint you consider the changesets as fixed and release an official version (e.g. 0.1). Then from thereon you will have to make a new changeset if that same table needs more columns. You can/should checkin your changeset files in the code repository though... — Jens, May 12 '14 at 13:20
Let me make sure I'm understanding you correctly. What you mean is to include all changesets and set them to run if changed. Then, instead of creating different changesets, we just change the ones we've already made in this sprint, and start creating releases. Is that right? If so, why do we need to create releases? Why not just keep adding changesets to a single changelog? Is it to break the changelogs to smaller, more manageable chunks? — Parham Doustdar, May 12 '14 at 17:05
I have extend my answer since the comments only allow a certain number of chars... Hope this clears things up. — Jens, May 12 '14 at 17:35
I'm still lost. You say that you don't modify the database manually and let Liquibase make modifications. Great. But then, you say that changesets aren't run until the end of a release when the schema has stabilized. These seem to be contrary. Could you please mention an exemplary, more specific workflow in your answer? Thanks. — Parham Doustdar, May 12 '14 at 18:06
I think this is getting way to complicated to be explained in comments. To talk about the overall philosophy of how liquibase can be used we should change to the liquibase forum. This should go back to your original question about how to tell which changesets run first when using `includeAll`. — Jens, May 13 '14 at 11:59

score 2 · Accepted Answer · answered May 12 '14 at 13:57

I've used a couple of different migration frameworks - Liquibase, Rails, and something from the C# world called Tarantino. Each used a similar strategy, where changes were recorded in separate files. Each was on a small team (<= 5 developers).

Rails is the most dogmatic about how files are named, and is where I have had the most conflicts due to branches. Most of those conflicts were name based rather than logical conflicts in the database.

In projects using Liquibase, we used a master/include pattern, and developers did their work on branches. Because the file names had a 3 digit sequence numbers plus a brief description (i.e. 009-add-customer0index.xml) we did not have name conflicts. We avoided database level conflict mainly by just talking to each other - daily standups, etc.

We had similar experiences using Tarantino, although it just uses a directory full of files for its migrations. As with Liquibase, we adopted a naming convention that kept things in order. Even when two changesets had the same number, they would have different names, and 99.9% of the time, the order of those two changesets was not dependent on each other.

Just for a datapoint, the project that used Tarantino leveled off with about 400 changesets.

I was referring to what you mentioned in your original question, that you "create a master changelog. All it does is to includeAll a folder which contains small files with small changesets inside." — SteveDonie, May 13 '14 at 18:38

How can I keep order of refactors using database refactoring software?

Liquibase Workflow

DBDeploy Workflow

The problem

2 Answers2