How to drop a 1000+ databases with innodb_file_per_table=1 without hanging the MySQL process?

Question

We have a recurring process in which we want to, and need to, clean up our databases. Every client or prospect gets its own database (with 300 tables, and increasing every month), which is spun up within seconds, and seeded with some basic data.

After several months, the databases need to be cleaned up. We simply call DROP DATABASE customer_1 for each database (giving the MySQL server 10 seconds between each statement to 'rest'), followed by DROP USER 'customer_1'@'127.0.0.1').

Every so often, the entire database just hangs. SHOW PROCESSLIST gives

Id     User       Command    Time    State         Info
[pid]  adm-user   Query      300     System lock   DROP DATABASE `customer_1`

No new queries will complete. Killing the relevant query pid will result in Command=Killing, and that's it. Nothing happens. The MySQL daemon cannot be stopped either, because it's still waiting on completing the query.

We've resulted to powering off the entire server, restarting it, and having MySQL do its automated crash recovery, which works fine. After which, we can drop another 10-30 databases, and then this event repeats itself.

We've read plenty on the subject, including but not limited to:

Seems like the consensus is, yes, it's MySQL that uses a global mutex lock on the table(space), combined with a large buffer pool size.

Our my.cnf:

innodb_file_per_table   = 1
innodb_buffer_pool_size = 9G
innodb_log_file_size    = 256M
innodb_flush_method     = O_DIRECT
table_open_cache        = 200000
table_definition_cache  = 110000
innodb_flush_log_at_trx_commit = 2

Is there any way in which we can drop databases responsibly -- ie., without having the server go down for other prospects?

I've read that simply removing all table files could work, dropping the database afterwards, in which MySQL should simply remove references to the database.

Is a script-based solution viable to you? E.g. Create a php script and run that? — treyBake, Feb 14 '19 at 17:33
Elaborate on " 300 tables, and increasing every month" -- That sounds like a major mistake in schema design. — Rick James, Feb 14 '19 at 23:25
With `innodb_flush_log_at_trx_commit = 2`, don't do a power off -- you could corrupt InnoDB tables. — Rick James, Feb 14 '19 at 23:28
You pointed to this, which seems like the likely answer: https://dba.stackexchange.com/a/42009/1876 -- On those times when the `DROP` took a long time, someone else did a `DROP` (or other action) that tried to get the same mutex. — Rick James, Feb 14 '19 at 23:34
@treyBake Yep, definitely. We've got full control over everything. — Mave, Feb 18 '19 at 10:29
@RickJames Perhaps so. But security is a huge factor in all of this. By separating the environment in its whole (both application and database), each customer operates fully in its own domain, and no data can be leaked between any of them (by bad actors, or bad code). — Mave, Feb 18 '19 at 10:30

Bill Karwin · Accepted Answer · 2019-02-17T16:59:47.223

One important thing you should do is use XFS filesystem for your MySQL datadir.

Dropping large files on ext3 filesystem takes too much time, as you no doubt read in the Percona blog you linked to. Using XFS makes dropping a large file much quicker, so the global mutex is held for a shorter time.

I would also drop the tables one at a time, to further reduce the time the mutex is held. Then after you've dropped all the tables, drop the database.

A database in MySQL is hardly a physical object at all. It's a subdirectory of the MySQL datadir, and a tiny file called db.opt that stores a few properties of the database like its default character set (this is no longer even a separate file in MySQL 8.0). After all the tables are dropped, dropping the database itself is trivial.

Another suggestion is to first drop the customer's MySQL user, then let MySQL run for a few hours, until data from that customer's tables are no longer cached in the buffer pool. When you drop a large table, MySQL has to scan through the buffer pool to free pages belonging to that table. The larger the buffer pool, the longer this takes. So you can minimize this impact if you let the pages for that customer's tables expire and leave the buffer pool. This can take some time, because it's driven more by demand for other tables. There's no good way to force a table's page to leave the buffer pool, short of dropping the table.

I've done that in some environments. Make the "DROP TABLE" request into a RENAME TABLE to move the table into another schema that no user has access to. Then periodically run a script to really drop the tables that have been in that holding pen for more than 7 days. This gives time for the pages to be evicted from the buffer pool gradually as data from other tables displace them. Besides, it also gives a grace period for users to change their mind if they decide they dropped a table that they need after all.

`FLUSH TABLES` writes all dirty pages belong to the table, and closes the file handle to the tablespace (if it's file-per-table), but it doesn't remove the table's clean pages from the buffer pool. — Bill Karwin, Feb 14 '19 at 23:44

score 3 · Answer 2 · answered Feb 20 '19 at 06:27

Bill Karwin's recommendations seem reasonable (although RENAME TABLE has triggered some of the same problems as DROP TABLE in the past), but most of that stuff was supposed to have been fixed: Bug 51325 was fixed in 2011-12-20 in 5.6.4 and Bug 64284 was fixed in 2012-08-09 in 5.6.6.

You may be experiencing something related to MySQL bug 91977, for which one suggested workaround is to disable the Adaptive Hash Index while dropping tables/databases.

SET GLOBAL innodb_adaptive_hash_index = OFF;
DROP TABLE ...
SET GLOBAL innodb_adaptive_hash_index = ON;

Or maybe just drop the adaptive hash indexes entirely. See the above linked documentation which states that whether they are a net benefit or not is workload-dependent and you should do performance testing to decide whether or not to use them.

You may want to upgrade to the current MySQL 5.7.x which is 5.7.25 and file a bug report if you can still reproduce the problem.

Weird part is, it happens on server A, but not on server B. Server B is a snapshot clone (DigitalOcean) of server A. Thanks for your suggestions -- as server A is now finally cleaned up, I've got weeks and weeks to go before we need to do another cleansing. Will definitely keep the adaptive hash indexing in mind then. Thanks! — Mave, Apr 01 '19 at 18:19

score 1 · Answer 3 · answered Feb 18 '19 at 10:53

Based on this comment-chain:

Is a script-based solution viable to you? E.g. Create a php script and run that? – treyBake Feb 14 at 17:33

@treyBake Yep, definitely. We've got full control over everything. – Mave 5 mins ag

You could do this via PHP (call it rm_databases.php for example:

$tableName = 'customer_';

for ($i = 1; $i <= 300; $++)
{
    # set up db conn
    $conn = new \PDO(
        'mysql:dbname='. $tableName .$i. ';host=localhost;',
        'user',
        'pass'
     );

     # create the SQL statement
     $sql = 'DROP DATABASE IF EXISTS '. $tableName .$i. ';';

     # exec it
     $conn->prepare($sql);
     $conn->execute();
}

echo 'done!';

Then you have the choice of either running manual by running:

php -f rm_databases.php

or you can set up via cronjob to run every 3 months:

0 0 12 ? 1/3 MON#1 * php -f rm_databases.php

This will run it the first Monday every 3 months.

Side Note: This will work if every customer db has the prefix of customer_$i - but if it's more dynamic than that, as gruelling as it may be, it might be worth just creating one mega array of database names and looping through that. Initial set-up time will be longer, but once you're done, it'll take 2 seconds to add new users:

$databases = [
    'foo', 'bar3', 'foobar', 'treyisawesome', 'wp-firesf'
    # etc etc
];

foreach ($databases as $el)
{
   # set up db conn
   $conn = new \PDO(
       'mysql:dbname='. $el .';host=localhost;',
       'user',
       'pass'
    );

    # create the SQL statement
    $sql = 'DROP DATABASE IF EXISTS '. $el .';';

    # rest of the script stays the same
}

This is kind of what we have, albeit much simpler. That's the whole problem - because we drop databases in this fashion, it locks up.. — Mave, Feb 20 '19 at 07:36
@Mave it shouldn't lock up, as it won't execute the next drop statement until the last is finished :) you could probably add a sleep command in there somewhere to help too — treyBake, Feb 20 '19 at 08:33
It shouldn't, but it does. Sleep doesn't matter. The DROP query simply hangs. It's a MysQL problem (or, server), not script execution. — Mave, Feb 21 '19 at 09:08

score 0 · Answer 4 · answered Feb 14 '19 at 17:27

You should tell the database engine to not make an exclusive (global) lock. You can do this in two ways:

Use the LOCK clause (NONE or SHARED).
Use ALTER ONLINE TABLE(same as LOCK=NONE).

ALTER TABLE tbl_name ADD PRIMARY KEY (column), ALGORITHM=INPLACE, LOCK=NONE;

After altering all tables or if you create with shared lock you should be able to drop all tables.

How to drop a 1000+ databases with innodb_file_per_table=1 without hanging the MySQL process?

4 Answers4