1

We are running samza job on hadoop yarn. Till now we were manually deploying job by calling run-job.sh on Resource Manager host.

run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file:///usr/share/promo-rules-consumer/config/config.properties

Samza deploy script and samza distribtuion tar "samza-dist.tar.gz" are all placed on Resource Manager local file system.

But now I would like to deploy jobs remotely. For this I am trying to use Resource Manager Submit apps rest API .

Request: POST http://hostname:8088/ws/v1/cluster/apps

Body :

{
    "application-id":"application_1470648527247_0031",
    "application-name":"test1_0",
    "am-container-spec":
    {

      "commands":
      {
        "command":"/usr/share/promo-rules-consumer/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file:///usr/share/promo-rules-consumer/config/montecarlo.properties"
      }

    },
    "application-type":"SAMZA"
  }

However I can see error from resource manager UI

Exception message: /bin/bash: /usr/share/promo-rules-consumer/bin/run-job.sh: No such file or directory

Please tell me the correct way of doing this. Is there any link showing deployment of samza job through rest API or through java code.

Thanks

Coder
  • 490
  • 4
  • 18
  • This may not currently be possible, since the JobRunner typically populates the coordinator stream with config. However, there has been some recent (not in a release version) work done which allows the job to startup in environments similar to this one. – Jon Bringhurst Nov 22 '16 at 18:38

1 Answers1

1

What we are doing is:

  1. Upload the tar.gz artifact to remote cluster HDFS (you can use web-hdfs:

    http --follow PUT 'http://namenode:50070/webhdfs/v1/user/someuser/location/samza-artifact.tar.gz?op=CREATE&user.name=someuser&overwrite=true' < /local-artifact-location/your-artifact-name-dist.tar.gz

  2. in task properties, specify yarn.package.path, something like

    yarn.package.path=hdfs://namenode:8020/user/someuser/location/samza-artifact.tar.gz

  3. Samza need to know your YARN settings, copy the following from your production Hadoop cluster, put in /yarn-location/conf (note, /conf is important)

    • apacity-scheduler.xml
    • core-site.xml
    • log4j.properties
    • yarn-env.sh
    • yarn-site.xml
  4. set HADOOP_YARN_HOME environment variable

export HADOOP_YARN_HOME=/yarn-location (without conf here..)

  1. Run the run-job.sh:

    bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=/path-to-config//your-task.properties

Michael
  • 346
  • 2
  • 7