0

For my requirement, I need to join data present in PostgreSQL db(hosted in RDS) and file present in S3 bucket. I have created a Glue job(spark-scala) which should connect to both PostgreSQL, S3 bucket and complete processing.

But Glue job encounters connection timeout while connecting to S3(below is error msg). It is successfully fetching data from PostgreSQL.

There is no permission related issue with S3 because I am able to read/write from same S3 bucket/path using different job. The exception/issue happens only if I try to connect both postgreSQL and S3 in one glue job/script.

In Glue job, glue context is created using SparkContext as object. I have tried creating two different sparkSession, each for S3 and postgreSQL db but this approach didnt work. Same timeout issue encountered.

Please help me in resolving the issue.

Error/Exception from log: ERROR[main] glue.processLauncher (Logging.scala:logError(91)):Exception in User Class com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to emp_bucket.s3.amazonaws.com:443 [emp_bucket.s3.amazonaws.com/] failed : connect timed out

Swapnil
  • 11
  • 1
  • 2

1 Answers1

0

This is fixed.

Issue was with security group. Only TCP traffic was allowed earlier. As part of the fix traffic was opened for all. Also, added HTTPS rule in inbound rules as well.

Swapnil
  • 11
  • 1
  • 2