2

Step1 : AssumeRole

public static AWSCredentialsProvider getCredentials() {
        if (roleARN.length() > 0) {
            STSAssumeRoleSessionCredentialsProvider credentialsProvider = new STSAssumeRoleSessionCredentialsProvider
                    .Builder(roleARN, Constants.SESSION_NAME)
                    .withStsClient(AWSSecurityTokenServiceClientBuilder.defaultClient())
                    .build();
            return credentialsProvider;
        }
        return new ProfileCredentialsProvider();
    }

Step 2 : Set Credentials to pipeline

credentials = getCredentials();
pipeline.getOptions().as(AwsOptions.class).setAwsRegion(Regions.US_WEST_2.getName());
pipeline.getOptions().as(AwsOptions.class).setAwsCredentialsProvider(new AWSStaticCredentialsProvider(new BasicAWSCredentials(credentials.getCredentials().getAWSAccessKeyId(), credentials.getCredentials().getAWSAccessKeyId())));

Step 3 : Run pipeline to write to s3

PCollection<GenericRecord> parquetRecord = formattedEvent
        .apply("ParquetRecord", ParDo.of(new ParquetWriter()))
        .setCoder(AvroCoder.of(getOutput_schema()));

parquetRecord.apply(FileIO.<GenericRecord, GenericRecord>writeDynamic()
        .by(elm -> elm)
        .via(ParquetIO.sink(getOutput_schema()))
        .to(outputPath).withNumShards(1)
        .withNaming(type -> FileNaming.getNaming("part", ".snappy.parquet", "" + DateTime.now().getMillisOfSecond()))
        .withDestinationCoder(AvroCoder.of(getOutput_schema())));

I am using 'org.apache.beam:beam-sdks-java-io-parquet:jar:2.22.0' and 'org.apache.beam:beam-sdks-java-io-amazon-web-services:jar:2.22.0'

Issue : Currently assumeRole seems to be not working.

Errors :

org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.

Or

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.InstanceProfileCredentialsProvider@71262020'
smac2020
  • 9,637
  • 4
  • 24
  • 38

2 Answers2

1

Recently release of beam (2.24.0) has the feature to assume role.

0

where do you run this pipeline from (in an AWS account ?) if yes then it is better to provide assume role access to the Role which runs the pipeline and then from the pipeline FileIO will just use the default AWS Client.

It is better to shift the assume role operation out of the pipeline and just allow S3 permissions to the Role running the pipeline.

Amit Kumar
  • 465
  • 1
  • 6
  • 19
  • Thanks for response. The pipeline is running on AWS account A, & writing files to account B using assumeRole so multiple accounts can access files in account B. Currently s3client write works with assumeRole. But FileIO seems for be not working when I pass assumeRole credentials to the pipeline. Same code works locally though. – Julius Almeida Jun 20 '20 at 05:12