0

I have found lots of answers of HOW to drop a DB and all its tables, but nothing around why it takes ~3-4 seconds per table to drop, seemingly in SERIAL (one after the other).

I have a database with 2,414 EXTERNAL Tables pointed at parquet locations, and DROP DATABASE <db> CASCADE; can take 1-2 HOURS just to drop the metadata for the DB.

In a separate session I can repeatedly SHOW TABLES IN <deleted DB>; and watch the count of tables go down at a rate of about 1 every 3-4 seconds. This takes upwards of 2 hours before the session releases the delete and allows us to replace the DB with a new one...

504 rows selected (0.29 seconds)
...
503 rows selected (0.17 seconds)
...
502 rows selected (0.29 seconds)
...

What is taking Hive so long? Is there a configuration I can use to make it quicker? Is there a way I can tell what it's doing during that time?

I would think others would have encountered this problem if it were more common, so that makes me think I have a setting somewhere I can tweak to fix this (?)...

The parquets don't seem to be deleted underneath the dropped database, so it doesn't seem to have anything to do with hdfs/parquet files unless dropping an external table checks them for any reason...

Any ideas why it would be so slow?

Rimer
  • 2,054
  • 6
  • 28
  • 43

1 Answers1

0

AFAIK, it has to drop all of its references. It can be external table, but in case there are lot of partitions, stats etc from metastore. Also, if it has lot of rows, it needs to acquire specific locks. You may want to check the metastore (mysql or equivalent) and see if you can introduce any indexes or collect stats on a periodic basis.

user3327034
  • 395
  • 3
  • 13