I have a connection from AWS Glue to Oracle R12 and it seems to work fine when I test it in the "connections" section of AWS Glue:
p-*-oracleconnection connected successfully to your instance.
I can crawl all the tables etc. and get the whole schema without a problem.
However as soon as I try to use these crawled tables in a Glue Job I get this:
py4j.protocol.Py4JJavaError: An error occurred while calling o64.getDynamicFrame.
: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
Connection String (Sanitised obviously)
jdbc:oracle:thin://@xxx.xxx.xxx.xxx:1000:FOOBAR
Loading into DynamicFrame
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(
database=args['INPUT_DATABASE'],
table_name=args['INPUT_TABLE_NAME'],
transformation_ctx="datasource0",
)
where the Glue job arguments are:
--INPUT_DATABASE p-*-source-database
--INPUT_TABLE_NAME foobar_xx_xx_animals
Which I have validated and both exist in AWS Glue
Reasons I have to stay using Spark on Glue:
- Job Bookmark
Reasons I have to use Glues built in connections and not direct from Spark:
- VPC is needed
I just don't understand why I can crawl all the tables and get all the metadata but as soon as I try to load this into a DynamicFrame it errors out...