9

I'm trying to execute spark-submit using boto3 client for EMR. After executing the code below, EMR step submitted and after few seconds failed. The actual command line from step logs is working if executed manually on EMR master.

Controller log shows hardly readable garbage, looking like several processes writing there concurrently.

UPD: Tried command-runner.jar and EMR versions 4.0.0 and 4.1.0

Any idea appreciated.

The code fragment:

class ProblemExample:
    def run(self):
        session = boto3.Session(profile_name='emr-profile')
        client = session.client('emr')
        response = client.add_job_flow_steps(
        JobFlowId=cluster_id,
        Steps=[
            {
                'Name': 'string',
                'ActionOnFailure': 'CONTINUE',
                'HadoopJarStep': {
                    'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
                    'Args': [
                        '/usr/bin/spark-submit',
                        '--verbose',
                        '--class',
                        'my.spark.job',
                        '--jars', '<dependencies>',
                        '<my spark job>.jar'
                    ]
                }
            },
        ]
    )
Robert Navado
  • 1,319
  • 11
  • 14

1 Answers1

18

Finally the problem resolved by escaping --jars values properly.

spark-submit was failing not finding classes, but on the background of messy logs the error is not clear.

The valid example is:

class Example:
  def run(self):
    session = boto3.Session(profile_name='emr-profile')
    client = session.client('emr')
    response = client.add_job_flow_steps(
    JobFlowId=cluster_id,
    Steps=[
        {
            'Name': 'string',
            'ActionOnFailure': 'CONTINUE',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                'Args': [
                    '/usr/bin/spark-submit',
                    '--verbose',
                    '--class',
                    'my.spark.job',
                    '--jars', '\'<coma, separated, dependencies>\'',
                    '<my spark job>.jar'
                ]
            }
        },
    ]
)
Robert Navado
  • 1,319
  • 11
  • 14
  • can you specify how will you provide coma separated jars, i need to give them and having same error ['--jars','/home/hadoop/jar1.jar,/home/hadoop/jar2.jar'] – A.B May 29 '19 at 16:19
  • @A.B you need to escape single parentheses inside parentheses. Like '\'j1.jar,j2.jar\'' – Robert Navado Jul 20 '19 at 18:43