2

First, let me point out that I read all the posts regarding database versioning, but this isn't exactly the thing I'm looking for, but I couldn't come up with a better title (the word 'full' here is the key).

I have a 'compiled database', that holds all kind of optimization records and statistics for a public transportation route planner, and it's generated through a program from another database. The compiled database never changes while it's active, except for user activity monitoring and caching. Some tables contain as much as 2-300,000 records.

This compiled database is fully renewed once a change occurs in the input database. So, any new database version doesn't interact with any other previous version, in any way. But I would like to store each version separately, and have the opportunity to use it from the program if the user desires (think of it like a history of public transportation maps).

The only reasonable method is simply to make different physical databases for each version, which is neither hard nor wrong, but I'm asking if you know any mechanism for versioning full databases (not just parts of data inside it, like the other posts are asking), for the purpose of making the whole thing more logical and clean.

I'm using SQL Server 2012, but probably on the server it will be 2008 R2.

And if you think of storing versioned data in the same database (and add a VersionID column in each table) forget about it, because on a table with 2-300,000 records, 10 versions (which would accumulate in a period of less than 3 months) would mean more than 3 million records, from which only 300,000 would be used, so, no way!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Tiborg
  • 2,304
  • 2
  • 26
  • 33
  • A VersionID column wouldn't be good anyways since presumably the newer versions could have a different schema. I'd probably go with separate databases like your initial hunch. – Levi W Aug 15 '12 at 21:00
  • Agreed with the comment around schema changes. But why do you think having a version column (because of size) is a problem? – M Afifi Aug 15 '12 at 21:10
  • Just to be contrarian, if the schema is the same between different "versions" of the data, why not have them all in the same database? You could employ table partitioning. Heck... even if they have different schema, they could all be in the same db, just with different table names if that stirs your chili. What's *your* objection to the one database model? – Ben Thul Aug 15 '12 at 21:28
  • @BenThul Indeed, the schema is identical, it never changes. The problem with storing them in the same database is that performance-wise it would be a total catastrophe, because only one version is used in 95% of the cases, but the data itself is only, let's say 10% if I have 10 versions. So 90% is uselessly queried, in 95% of the cases. And I don't think the purpose of partitioning is for versioning, it would be a total mess to optimize the queries (partition elimination and so forth) – Tiborg Aug 15 '12 at 21:54
  • 1
    @Tiby: IMO, this is exactly what partitioning is for. Data that changes over time where you have only a subset of it that's actively being used at any given time. As for partition elimination being a mess, if you partition on your VersionID (or whatever you call it) and use that in your query for the data, the server does the elimination for you. – Ben Thul Aug 16 '12 at 12:06

1 Answers1

2

I think the multi-database approach is good. You could try to do storage-level data deduplication. That would probably cut down storage usage considerable. Just make sure that you create the new database as a backup and restore from the old one so that they are mostly byte-identical.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Deduplication is a solution for reducing used space, but I don't think it's really an issue at the moment, because even if I have tables with more than 100,000 records, they're mainly tiny and smallints, so a database doesn't exceed 50-100 mb. I'm primarily interested in 'linking' databases as separate entities, but sharing a common schema. – Tiborg Aug 15 '12 at 22:04
  • Ok that's not possible cross-database. You can keep multiple versions as separate partitions of the same table. You can introduce a partitioning key "VersionID smallint not null". Perf will be the same as now if you always access by VersionID. – usr Aug 15 '12 at 22:07