0

I am extending BigQueryTornadoes example from https://github.com/apache/beam. I am making a change so that it would write to AWS S3 as a sink. In my first iteration, I was able to make it work with the following code.

    public static void main(String[] args) {
        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);

        options.setAwsCredentialsProvider(
                new AWSStaticCredentialsProvider(
                        new BasicAWSCredentials(options.getAwsAccessKey().get(), options.getAwsSecretKey().get())));

        runBigQueryTornadoes(options);
    }

For my second iteration, I wanted to work with STSAssumeRoleSessionCredentialsProvider to support cross-account IAM roles. I have the following code.

    public static void main(String[] args) {
        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);

        AWSCredentialsProvider provider = new AWSStaticCredentialsProvider(new BasicAWSCredentials(options.getAwsAccessKey().get(), options.getAwsSecretKey().get()));
        AWSSecurityTokenServiceClientBuilder stsBuilder = AWSSecurityTokenServiceClientBuilder.standard().withCredentials(provider);
        AWSSecurityTokenService sts = stsBuilder.build();

        AWSCredentialsProvider credentialsProvider = new STSAssumeRoleSessionCredentialsProvider.Builder(options.getAwsRoleArn().get(), options.getAwsRoleSession().get())
                .withExternalId(options.getAwsExternalId().get())
                .withStsClient(sts)
                .build();
        options.setAwsCredentialsProvider(credentialsProvider);

        runBigQueryTornadoes(options);
    }

When I run the code above, I get the following exception.

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider@4edb24da'
    at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE (JsonMappingException.java:338)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes (ObjectMapper.java:3432)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:163)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:67)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:317)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:303)
    at org.apache.beam.examples.cookbook.BigQueryTornadoesS3STS.runBigQueryTornadoes (BigQueryTornadoesS3STS.java:251)
    at org.apache.beam.examples.cookbook.BigQueryTornadoesS3STS.main (BigQueryTornadoesS3STS.java:267)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:748)

I ran with the following mvn command.

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.cookbook.BigQueryTornadoesS3STS "-Dexec.args=..." -P direct-runner

I saw the similar post at Beam: Failed to serialize and deserialize property 'awsCredentialsProvider. But I am facing the issue without packaging it as a jar.

1 Answers1

0

This post I am trying to write to S3 using assumeRole via FileIO with ParquetIO helped me to make my code work. With the code below, I was able to assume the cross-account IAM role and write to the S3 bucket owned by another AWS account.

        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);

        AWSCredentialsProvider provider = new AWSStaticCredentialsProvider(new BasicAWSCredentials(options.getAwsAccessKey().get(), options.getAwsSecretKey().get()));
        AWSSecurityTokenServiceClientBuilder stsBuilder = AWSSecurityTokenServiceClientBuilder.standard().withCredentials(provider);
        AWSSecurityTokenService sts = stsBuilder.build();

        STSAssumeRoleSessionCredentialsProvider credentials = new STSAssumeRoleSessionCredentialsProvider.Builder(options.getAwsRoleArn().get(), options.getAwsRoleSession().get())
                .withExternalId(options.getAwsExternalId().get())
                .withStsClient(sts)
                .build();

        options.setAwsCredentialsProvider(
                new AWSStaticCredentialsProvider(
                        credentials.getCredentials()));

        runBigQueryTornadoes(options);
    }

Note: The code is based on BigQueryTornadoes example from https://github.com/apache/beam.