0

I am using AWS EMR clusters to run Hive. I want to be able to enforce that certain tables should never be empty After initial creation, such as refrence tables, and if they are found to be empty to throw an error (or log a message) and stop processing.

Does anyone know of any ways to achieve this?

Thanks

cbradsh1
  • 493
  • 5
  • 12

1 Answers1

0

You could install a cron job on the master server that periodically runs a check against your Hive table. Once this table is empty, you can terminate the cluster or stop the job flow or take some other action. These actions can be executed using EMR CLI tools http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html

These commands can also be run using AWS SDK inside a Java Program - in case you want all of this as a Java program instead of a script.

You have not specified if the cluster is persistent or transient. If it is persistent, this script can run outside the master.

user1452132
  • 1,758
  • 11
  • 21
  • Thanks for the answer, I added extra details in bold. I'm not so interested in scanning these tables periodically, moreso just an initial assertion that they are not empty after initial creation (ie. External reference tables) – cbradsh1 Sep 10 '14 at 18:11