0

When issuing the msck repair table statement, is the table still accessible for querying during the udpate?

I ask because I'm trying to figure out the best update schedule for a relatively large S3 hive table that is used to drive some reports in QuickSight. Will issuing this command break anyone who happens to simultaneously be running a QuickSight report based on this table?

Robert Penridge
  • 8,424
  • 2
  • 34
  • 55

1 Answers1

1

Yes, the table will be available for running queries while you are running MSCK REPAIR TABLE, it's a background process. Queries run while that command is running will see different partitions, though, as the partitions the command discovers will be added as they are found.

Be aware that running MSCK REPAIR TABLE is a very inefficient process, with many partitions it will run for a very long time, and it is not incremental. This doesn't matter for query performance, but if it takes a long time now, it will only ever take longer and longer and might not be a viable long term strategy. There are some other questions here on StackOverflow about it that you can read to find other strategies for keeping your tables up to date.

Theo
  • 131,503
  • 21
  • 160
  • 205
  • Do you think it's reasonable to use `MSCK REPAIR TABLE` for initial loading of all partitions in a new bucket? In this scenario does it perform worse than if you supplied each as a `PARTITION` with `ALTER TABLE`? To be clear this would be a one time load. The table won't be created til all the data is written and it will effectively be read only after that. – evanmcdonnal Sep 16 '21 at 19:14
  • 1
    Yes, that can work. If there’san enormous amount of partitions it can time out, but you don’t pay for the time it takes to run queries so by all means do it as a one-off thing. – Theo Sep 17 '21 at 18:01