I have found lots of answers of HOW to drop a DB and all its tables, but nothing around why it takes ~3-4 seconds per table to drop, seemingly in SERIAL (one after the other).
I have a database with 2,414 EXTERNAL Tables pointed at parquet locations, and DROP DATABASE <db> CASCADE;
can take 1-2 HOURS just to drop the metadata for the DB.
In a separate session I can repeatedly SHOW TABLES IN <deleted DB>;
and watch the count of tables go down at a rate of about 1 every 3-4 seconds. This takes upwards of 2 hours before the session releases the delete and allows us to replace the DB with a new one...
504 rows selected (0.29 seconds)
...
503 rows selected (0.17 seconds)
...
502 rows selected (0.29 seconds)
...
What is taking Hive so long? Is there a configuration I can use to make it quicker? Is there a way I can tell what it's doing during that time?
I would think others would have encountered this problem if it were more common, so that makes me think I have a setting somewhere I can tweak to fix this (?)...
The parquets don't seem to be deleted underneath the dropped database, so it doesn't seem to have anything to do with hdfs/parquet files unless dropping an external table checks them for any reason...
Any ideas why it would be so slow?