0

I'm using HPD-2.4.2 and try to add partitions to an external Hive table using an Oozie coordinator job. I created a coordinator that daily tiggers the following workflow:

<workflow-app name="addPartition" xmlns="uri:oozie:workflow:0.4">
  <start to="hive"/>
  <action name="hive">
   <hive2 xmlns="uri:oozie:hive2-action:0.1">
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <jdbc-url>jdbc:hive2://${jdbcPath}</jdbc-url>
    <password>yarn</password>
    <script>${appPath}/addPartition.q</script>
    <param>nameNode=${nameNode}</param>
    <param>dt=${dt}</param>
    <param>path=${path}</param>
   </hive2>
   <ok to="end" />
   <error to="fail" />
  </action>
  <kill name="fail">
   <message>
    Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
   </message>
  </kill>
  <end name="end" />
</workflow-app>

The executed script contains

CREATE EXTERNAL TABLE IF NOT EXISTS visits (sid BIGINT, os STRING, browser STRING, visit_time TIMESTAMP)
  PARTITIONED BY (dt STRING) 
  STORED AS PARQUET;

ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION   '${nameNode}/data/parquet/visitors/${path}'; 

If I run the job the table is created but no partition is added. In yarn log I find:

Beeline command arguments :
         -u
         jdbc:hive2://localhost:10000/default
         -n
         yarn
         -p
         yarn
         -d
         org.apache.hive.jdbc.HiveDriver
         --hivevar
         nameNode=hdfs://bigdata01.local:8020
         --hivevar
         dt=2016-01-05
         --hivevar
         path=2016/01/05
         -f
         addPartition.q
         -a
         delegationToken
         --hiveconf
         mapreduce.job.tags=oozie-1b3b2ee664df7ac9ee436379d784955a

Fetching child yarn jobs
tag id : oozie-1b3b2ee664df7ac9ee436379d784955a
Child yarn jobs are found - 
=================================================================

>>> Invoking Beeline command line now >>>
[...]
0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD    PARTITION(dt = '${dt}') LOCATION  '${nameNode}/data/parquet/visitors/${path}';

It looks as if the parameters in the ALTER TABLE are not replaced, to check this I tried to call beeline directly from the CLI:

beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03" -e "ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';"

which results in an error:

Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. partition spec is invalid; field dt does not exist or is empty (state=08S01,code=1)

if I run the alter statement without parameters

beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver -e "ALTER TABLE visits ADD PARTITION(dt='2016-01-03') LOCATION 'hdfs://bigdata01.local:8020/data/parquet/visitors/2016/01/03';"

or open a beeline console with hivevars defined and execute the alter statement

beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03"
0: jdbc:hive2://localhost:10000/default>  ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';

the partition is created.

Where am I wrong?

Update:

The values for the parameters in the hive2 action are defined in the oozie.properties file and the coordinator.xml

<property>
  <name>nameNode</name>
  <value>${nameNode}</value>
</property> 
<property>
 <name>dt</name>                
 <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy-MM-dd')}</value>
</property>
<property>
  <name>path</name>                    
  <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy/MM/dd')}</value>
</property>

in yarn log you find

Parameters:
------------------------
nameNode=hdfs://bigdata01.local:8020
dt=2016-01-05
path=2016/01/05 

before they are set as hivevars in the beeline call from the hive2 action.

  • Can you please check Oozie console as to what parameters (along with their values) are being passed to your hive script and share it? This information is needed to solve this issue. – janeshs Jun 28 '16 at 09:28
  • I'm not sure if I understand what you want. I try to update the question with some more informations hope it helps. – Miguel San Jun 29 '16 at 06:41
  • Try using "-f" option (i.e using a external file) instead of "-e" option (i.e inline query execution) and see if that helps. $ beeline -f – janeshs Jun 29 '16 at 06:52
  • Already tried -f option; from the logs it seams as if hive2 action exactly does this. Looked nice on the console `0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';`but no partition was added. Looks as if beeline doesn't replace ${...} with the passed hivevars. – Miguel San Jun 29 '16 at 07:14
  • Then the option left with beeline is to open a beeline console with hivevars defined in ADVANCE and then execute the statement - as already tried by you. – janeshs Jun 29 '16 at 10:56

2 Answers2

0

Thanks for your help but I give up. Instead of a hive2 action I will use a ssh action to execute beeline with a static alter statement.

<ssh xmlns="uri:oozie:ssh-action:0.1">
  <host>${sshUser}@${sshHost}</host>
  <command>"beeline"</command>
  <args>-u</args>
  <args>jdbc:hive2://localhost:10000/default</args>
  <args>-n</args>
  <args>yarn</args>
  <args>-p</args>
  <args>yarn</args>
  <args>-d</args>
  <args>org.apache.hive.jdbc.HiveDriver</args>
  <args>-e</args>
  <args>"ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/raw/parquet/visitors/${path}';"</args>
  <capture-output />
</ssh>
0

Finally found the problem. You have to use double quotes instead of single quotes ;-)

$ beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e "SELECT '${foo}' as foo;     

+------+--+
| foo  |  
+------+--+
|      |
+------+--+  

beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e 'SELECT "${foo}" as foo;'  

+------+--+
| foo  |
+------+--+
| bar  |
+------+--+

beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -f selectFoo.q

+------+--+
| foo  |
+------+--+
| bar  |
+------+--+