1

I have AWS Glue Job which is using Spark and Scala with jdbc connections specified in the script for custom ETL and data decryption. While running the job in an environment where databases are not publicly available the jobs are failing with communication failure. I have the Glue and S3 Endpoints defined in the same VPC as the database but still no success.

Also tried creating a fake connection and a Network connection but after adding the connection to the job, the job never finishes and is stuck.

Jun 22, 2021, 7:46:47 AM 21/06/22 14:46:47 WARN ApacheUtils: NoSuchMethodException was thrown when disabling normalizeUri. This indicates you are using an old version (< 4.5.8) of Apache http client. It is recommended to use http client version >= 4.5.9 to avoid the breaking change introduced in apache client 4.5.7 and the latency in exception handling. See https://github.com/aws/aws-sdk-java/issues/1919 for more information
Jun 22, 2021, 7:46:48 AM 21/06/22 14:46:48 INFO Utils: Successfully started service 'sparkDriver' on port 37917.
Jun 22, 2021, 7:46:50 AM 21/06/22 14:46:50 INFO GlueContext: GlueMetrics configured and enabled
Jun 22, 2021, 7:47:48 AM 21/06/22 14:47:48 WARN EC2MetadataUtils: Unable to retrieve the requested metadata (/latest/user-data/). The requested metadata is not found at http://169.254.169.254/latest/user-data/
Jun 22, 2021, 7:47:48 AM 21/06/22 14:47:48 ERROR UserData: Error encountered while try to get user data
Jun 22, 2021, 7:47:48 AM 21/06/22 14:47:48 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress
Jun 22, 2021, 7:48:47 AM 21/06/22 14:48:47 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress
Jun 22, 2021, 7:49:47 AM 21/06/22 14:49:47 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress
Jun 22, 2021, 7:50:47 AM 21/06/22 14:50:47 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress
Jun 22, 2021, 7:51:47 AM 21/06/22 14:51:47 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress
Jun 22, 2021, 7:52:47 AM 21/06/22 14:52:47 INFO MultipartUploadOutputStream: close closed:false s3://{{bucket}}/spark/spark-application-1624373208897.inprogress

Without Connection

Jun 21, 2021, 8:29:12 PM 21/06/22 03:29:12 ERROR ProcessLauncher: InvocationTargetException java.lang.reflect.InvocationTargetException
Jun 21, 2021, 8:29:12 PM 21/06/22 03:29:12 ERROR ProcessLauncher: Exception in User Class
Jun 21, 2021, 8:29:12 PM 21/06/22 03:29:12 ERROR ProcessLauncher: Exception in User Class: com.mysql.cj.jdbc.exceptions.CommunicationsException : Communications link failure com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174) com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64) com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836) com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:456) com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246) com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54) org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56) org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210) org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35) org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)```
  • If the RDS is set to `pubiclyAvailable=False` then resolution will always use the instances private address. You said the services are in the same vpc, but are they in the same subnets? Do you have a secgrp on your RDS instance that allows traffic from whatever cidr your connection would originate from? – Kisaragi Jun 22 '21 at 15:06

0 Answers0