0

I'm creating an oozie worklow where I need to have multiples shell actions but I'm facing the problem that for every shell action I have in my workflow I have to declare an environment variable meaning that if I have 10 shell actions I need to declare 10 times, my question is: if there's any way I can declare/create global variables in order to avoid duplicated variables that are doing the same?

Example:

    jon.properties
    oozie.use.system.libpath=true
    security_enabled=False
    dryrun=False
    nameNode=hdfs://localhost:8020
    user_name=test
    jobTracker=localhost::8032

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="shell-a0a5"/>
<kill name="Kill">
    <message>Error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-a0a5">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>script1.sh</exec>
        <file>/user/hive/script1.sh#script1.sh</file>
    </shell>
    <ok to="End"/>
    <error to="Kill"/>
</action>
<end name="End"/>

my script1.sh is expecting a parameter named as user_name which I have it declared into the job.properties but it's not working in my workflow I'm getting missing argument username

I would like to know how can i send parameters to a shell script from a global configuration file

Thanks

jthalliens
  • 504
  • 4
  • 14

2 Answers2

0

I was not able to create global parameters in order to pass values as : user & password, HADOOP_USER_NAME (in my ase ) but I was able to figure it out using shell script, so within the shell I define the following parameters for my proposal:

export HADOOP_USER_NAME=admin;
connection=$(hdfs dfs -cat /user/connection.txt)

where the connection.txt contains all the information for the connection string then using sqoop I pass the info in this way within the shell file:

sqoop $connection --table test --target-dir /user/hive/warehouse/Jeff.db/test/ --m 1 --delete-target-dir

and in this way I was able to resolve my problem, I had to declare some global variables but those were necessary to execute sqoop in parallel using &.

jthalliens
  • 504
  • 4
  • 14
0

passing global parameters to a shell action is not possible. The globals section is only for properties. For more details see the answer for this question: OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

To pass parameters/variables in a shell action you can either pass those values as arguments via the shell action (you can still declare them in your job.properties file:

<action name="shell-<name>">
  <shell xmlns="uri:oozie:shell-action:0.3">
    <exec>script1.sh</exec>
    <argument>${user_name}</argument>
    <argument>${database}</argument>
    <argument>${etc}</argument>
    <file>/user/hive/script1.sh#script1.sh</file>
  </shell>
    <ok to="End"/>
    <error to="Kill"/>    
</action>

In your shell script you than can call these variables like this:

#!/bin/bash -e
user_name=${1}
database=${2}
etc=${3}
<your shell commands>

and you can then use those variables in your shell script. You can also use just $1, $2, etc but for readability it's better to name your arguments first.

To prevent the passing of lots of arguments to each and every shell action you can also add a config file to your shell action, with all these parameters, and import that file in your actual shell script:

<action name="shell-<name>">
  <shell xmlns="uri:oozie:shell-action:0.3">
    <exec>script1.sh</exec>
    <file>/user/hive/script1.sh#script1.sh</file>
    <file>/user/hive/CONFIG_FILE</file>
  </shell>
    <ok to="End"/>
    <error to="Kill"/>    
</action>

shell script:

#!/bin/bash
. CONFIG_FILE

<your shell commands>
Community
  • 1
  • 1
R. Sluiter
  • 162
  • 1
  • 1
  • 13