1

I have been using a Beam pipeline examples as a guide in an attempt to load files from S3 for my pipeline. Like in the examples I have defined my own PipelineOptions that also extends S3Options and I am attempting to use the DefaultAWSCredentialsProviderChain. The code to configure this is:

MyPipelineOptions options = PipelineOptionsFactory.fromArgs(args).as(MyPipelineOptions.class);

options.setAwsCredentialsProvider(new DefaultAWSCredentialsProviderChain());
options.setAwsRegion("us-east-1");

runPipeline(options);

When I run it from Intellij it works fine using the Direct Runner but when I package it as a jar and it execute it (also using the Direct Runner) I see:

Exception in thread "main" java.lang.IllegalArgumentException: PipelineOptions specified failed to serialize to JSON.
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:166)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
    at a.b.c.beam.CleanSkeleton.runPipeline(CleanSkeleton.java:69)
    at a.b.c.beam.CleanSkeleton.main(CleanSkeleton.java:53)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.DefaultAWSCredentialsProviderChain@40f33492'
    at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE(JsonMappingException.java:338)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes(ObjectMapper.java:3247)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:163)
    ... 5 more

I am using gradle to build my jar with the following task:

jar {
    manifest {
        attributes (
                'Main-Class': 'a.b.c.beam.CleanSkeleton'
        )
    }
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    }
    from('src') {
        include '/main/resources/*'
    }



    zip64 true
    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'
}
Dean
  • 1,833
  • 10
  • 28
  • How do you package it as jar, can you share a the command? Do you have `DefaultAWSCredentialsProviderChain` on your classpath when running the packaged jar? Not having it on the classpath would be my main suspect. Intellij and tests are likely setting it up automatically (e.g. it is specified in the build file), but when you manually execute a jar you have to make sure you have correct dependencies available at runtime. – Anton Jul 15 '19 at 16:54
  • I have edited my question to show how the jar is made. – Dean Jul 15 '19 at 18:24
  • What does your dependencies section look like in the `build.gradle`? Do you have all the AWS dependencies there? – Anton Jul 16 '19 at 18:32
  • I have all the AWS dependencies that I require. Determined the problem. META-INF/services files where been overwritten when creating the fat jar. – Dean Jul 16 '19 at 20:45

1 Answers1

1

The problem was occuring because when the the fat/uber jar was being created, files in META-INF/serivces where being overwritten by duplicate files. Specifically com.fasterxml.jackson.databind.Module where a number of Jackson modules needed to be defined but where missing. These include org.apache.beam.sdk.io.aws.options.AwsModule and com.fasterxml.jackson.datatype.joda.JodaModule. The code in the DirectRunner instantiates the ObjectMapper like so :

new ObjectMapper()
      .registerModules(ObjectMapper.findModules(ReflectHelpers.findClassLoader()));

ObjectMapper::findModules relies on java.util.ServiceLoader which locates services from META-INF/services/ files.

The solution was to use the gradle Shadow plugin to build the fat/uber jar and configure it to merge the services files:

apply plugin: 'com.github.johnrengelman.shadow'
shadowJar {
    mergeServiceFiles()
    zip64 true
}
Dean
  • 1,833
  • 10
  • 28