I often have a large block of HiveQL that I want to run multiple times with different settings for some variables.
A simple example would be:
set mindate='2015-01-01 00:00:00'
set maxdate='2015-04-01 00:00:00'
select * from my_table where the_date between ${hiveconf:mindate} and ${hiveconf:maxdate}
Which is then run via hive -f myfile.sql > myout.log
Later, I would like to change the variables and re-run. I also want a record of what values the variables had each time I ran.
So I currently make copies of the HiveQL file that are the same except for the variable values. This is obviously error-prone, however, because if I need to change the actual HiveQL, then I have to change it in every file.
Ideally, I could store all my settings a JSON file (or whatever) and have my HiveQL file be totally dynamic. Is there any way to do this?