2

I often have a large block of HiveQL that I want to run multiple times with different settings for some variables.

A simple example would be:

set mindate='2015-01-01 00:00:00'
set maxdate='2015-04-01 00:00:00'
select * from my_table where the_date between ${hiveconf:mindate} and ${hiveconf:maxdate}

Which is then run via hive -f myfile.sql > myout.log

Later, I would like to change the variables and re-run. I also want a record of what values the variables had each time I ran.

So I currently make copies of the HiveQL file that are the same except for the variable values. This is obviously error-prone, however, because if I need to change the actual HiveQL, then I have to change it in every file.

Ideally, I could store all my settings a JSON file (or whatever) and have my HiveQL file be totally dynamic. Is there any way to do this?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Corey
  • 1,845
  • 1
  • 12
  • 23

1 Answers1

1

Set your variables in the config file and load this file in your hql script:

 source /path_to_your_config_file/config.hql; 
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • But then the name of the config file is a variable that I would want to change... I guess I could reference a softlink in the hql script and change where the link points. – Corey Sep 08 '16 at 18:54
  • You can store all your params in database/hive and generate a file before running a script. Categorize parameters, for example COMMON - for all scripts and SCRIPT_NAME - for particular script. You can easily change params in DB. It's not so convenient to change file location – leftjoin Sep 08 '16 at 19:22