1

I am using the following to add a spark job step to a running emr v5.11.1 cluster on AWS, using python 3.6.5 boto3 and spark 2.2.1:

myemr = boto3.client('emr',region_name=os.environ['AWS_DEFAULT_REGION'])            
response = myemr.add_job_flow_steps(
            JobFlowId=my_emr_id,
            Steps=[
                {
                    'Name': key,
                    'ActionOnFailure': 'CONTINUE',
                    'HadoopJarStep': {
                        'Jar': 'command-runner.jar',
                        'Args': [
                            'spark-submit',
                            '--deploy-mode', 'cluster',
                            '--master', 'yarn',
                            '--conf',  'spark.yarn.appMasterEnv.my_password=sensitive_value',
                            '--conf',  'spark.redaction.regex=password',
                            '--class', 'com.myApp', 's3a://myjarurl.jar',
                            '-c', 's3a://s3bucket_myconfig_location',
                            '-w', 'myconfig.json',
                            '-e', 'prod',
                            '-n', 'demo'
                        ]
                    }
                }
            ]
        )

The step is running ok and able to access the yarn environment variable. But the sensitive_value is not redacted from logs or AWS EMR Console, steps tab. I see --conf, spark.yarn.appMasterEnv.my_password=sensitive_value displayed in both.

I would like for either the variable spark.yarn.appMasterEnv.my_password to be completely removed from logs and console, or the sensitive_value to be replaced with something like ***.

Reading the apache spark doc https://spark.apache.org/docs/2.2.1/configuration.html I thought this would work. Appreciate any suggestions.

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
BilboC
  • 149
  • 2
  • 15
  • 1
    For me it worked with `"(?i)secret|password|dsn"`, as I need to mask entries with secred, password and dsn in them. The `(?i)` is just to enable case-insensitiveness. – martinarroyo Jul 30 '18 at 11:17
  • @martinarroyo - can you give complete example of this. I am stuck at this and not able to go ahead. – tenderfoot Apr 08 '20 at 14:35
  • Sorry for the late reply, I meant something like `spark.redaction.regex=(?i)secret|password|dsn`. – martinarroyo Apr 13 '20 at 19:13

0 Answers0