I have to read data from two different bucket(bucket1 and bucket2) which are located in different regions(us-east-1 and us-east-2), The Apache beam pipeline is as follows,
AWSCredentials credentials = new BasicAWSCredentials("*********", "************************************");
PipelineOptions options = PipelineOptionsFactory.as(AwsOptions.class);
options.setRunner(DirectRunner.class);
options.as(AwsOptions.class).setAwsCredentialsProvider(new AWSStaticCredentialsProvider(credentials));
options.as(AwsOptions.class).setAwsRegion("us-east-1");
options.as(AwsOptions.class).setAwsRegion("us-east-2");
Pipeline pipeline = Pipeline.create(options);
PCollection<String> bucket1Data = pipeline.apply(TextIO.read().from("s3://bucket1/data/employee.txt")); // located in us-east-1 region
PCollection<KV<Row, Row>> bucket1DataRows = bucket1Data
.apply(ParDo.of(new MyFunction(.....)));
PCollection<String> bucket2Data = pipeline.apply(TextIO.read().from("s3://bucket2/data/employee.txt")); // located in us-east-2 region
PCollection<KV<Row, Row>> bucket2DataRows = bucket2Data
.apply(ParDo.of(new MyFunction(.....)));
.... processing logics .....
final PipelineResult result = pipeline.run();
The above code needs pipeline options to be set with "us-east-1" and "us-east-2" as we are using buckets from these regions but,with this code I get following exception:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 8q8egbislbcgi; S3 Extended Request ID: DFV5637IYGOIUGB/9X+UKDVGILSDBC9LSUDBCLISdbCFr4=), S3 Extended Request ID: KGDVCUYLVsIC.VKs/9X+SDVLCIVSdbLIHVsdI;.sbdkC+Fr4= at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860) at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.getObjectMetadata(S3FileSystem.java:358) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.matchNonGlobPath(S3FileSystem.java:365) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$matchNonGlobPaths$2(S3FileSystem.java:348) at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:104) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Is there any way to achieve this.
Thank you.