18

I know that MSCK REPAIR TABLE updates the metastore with the current partitions of an external table.

To do that, you only need to do ls on the root folder of the table (given the table is partitioned by only one column), and get all its partitions, clearly a < 1s operation.

But in practice, the operation can take a very long time to execute (or even timeout if ran on AWS Athena).

So my question is, what does MSCK REPAIR TABLE actually do behind the scenes and why?

How does MSCK REPAIR TABLE find the partitions?


Additional data in case it's relevant:

Our data is all on S3, it's both slow when running on EMR (Hive) or Athena (Presto), there are ~450 partitions in the table, every partition has on avg 90 files, overall 3 Gigabytes for a partition, files are in Apache parquet format

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
gdoron
  • 147,333
  • 58
  • 291
  • 367
  • https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL mentions `ALTER TABLE RECOVER PARTITIONS`. Is it just an alias for `MSCK` or does it do less work? – Piotr Findeisen Dec 07 '18 at 11:27
  • @PiotrFindeisen Seems to be just the equivalent command for EMR. – gdoron Dec 07 '18 at 11:40
  • For all I know, it lists all the partition files and gather some metadata about them. If you have 450 partitions and 90 files per partition, it might be doing 40500 calls to s3 to get every file size individually. I'm not sure if it does more than that, but if it does, it probably runs some statistical analysis on the files as well. If it's the case, you can try with this option: SET hive.stats.autogather=false; How long exactly does it take? Are we talking a around few minutes or a few hours? A few minutes wouldn't shock me. – FurryMachine Dec 08 '18 at 08:45

1 Answers1

23

You are right in the sense it reads the directory structure, creates partitions out of it and then updates the hive metastore. In fact more recently, the command was improved to remove non-existing partitions from metastore as well. The example that you are giving is very simple since it has only one level of partition keys. Consider table with multiple partition keys (2-3 partition keys is common in practice). msck repair will have to do a full-tree traversal of all the sub-directories under the table directory, parse the file names, make sure that the file names are valid, check if the partition is already existing in the metastore and then add the only partitions which are not present in the metastore. Note that each listing on the filesystem is a RPC to the namenode (in case of HDFS) or a web-service call in case of S3 or ADLS which can add to significant amount of time. Additionally, in order to figure out if the partition is already present in metastore or not, it needs to do a full listing of all the partitions which metastore knows of for the table. Both these steps can potentially increase the time taken for the command on large tables. The performance of msck repair table was improved considerably recently Hive 2.3.0 (see HIVE-15879 for more details). You may want to tune hive.metastore.fshandler.threads and hive.metastore.batch.retrieve.max to improve the performance of command.

Vihang
  • 246
  • 2
  • 2
  • Thanks! I'm curious though why even in my table which has only 1 column part partition the operation takes so long? From reading the source code it seems like only the first level needs to be read, which is just 450 files (directories). – gdoron Dec 11 '18 at 05:26
  • Actually you are right. msck is doing a unnecessary listing at a level which is greater than the number of partitions. This can be improved. I have created HIVE-21040 for this. – Vihang Dec 13 '18 at 22:20
  • Thanks a lot for verifying with the source code, finding it and opening the Jira! Would be great if you could update your answer since most people don't bother reading the comments... – gdoron Dec 14 '18 at 08:52
  • 4
    worth noting that `MSCK REPAIR` picks up new paths to perform `ADD PARTITION` but **not the reverse**: if the path of a previously existing `partition` gets deleted then running `MSCK REPAIR` [won't achieve](https://stackoverflow.com/a/44931540/3679900) the effect of `DROP PARTITION`; this will have to be done manually. Also see [this](https://www.quora.com/How-can-I-drop-partition-in-external-table-in-hive) common pitfall – y2k-shubham Feb 28 '19 at 05:28