Background info: we run a an e-commerce website and are trying to figure out what is the best way to deal with "historical" data for tables that are used very frequently and are bound too contain a lot of records (ie. orders, customers, etc).
I am particularly looking at 2 specific scenarios:
- DB migrations
- SELECTs
DB migrations
In the case of DB migrations, we are starting to see that we sometimes need to run some ALTER TABLE which are locking the whole table and, if the table has so many records, this can take a while. Of course, all operations on the table are on hold until the migration is done, which means that our checkout might be down just because we are changing a VARCHAR(15)
to VARCHAR(256)
.
From MySQL 5.6, a lot of operations are done "INPLACE" which means (from what I understood) that they wont be creating a full-table lock: that's ok-ish but still not perfect -- what if we need to change the type of a column (cannot be executed INPLACE) and we really do not want to be in maintenance mode for minutes?
My super-ghetto idea was to simply replicate the table (copy it over), then execute the migration on the copied table, stop writing to the original table (ie. lock it), copy the data that wasnt synced to the copied one and swap them. I think the percona tool for zero-downtime migrations does something similar so maybe that's the "best" approach?
Opinions?
SELECTs
For SELECTs
, since most of the old data is accessed very rarely, I thought of range-partitioning it by date (say pre-2015 / post-2015 for example) and then change most of our queries to fetch stuff WHERE YEAR(created_at) >= 2015
.
If the user wants his full historical data then we'd dynamically remove that condition. This somehow ensures that data is well-partitioned.
Any other idea? Do you think partitioning might be worth it?