0

While running AWS Glue Python shell (not using Spark) I want to connect with Oracle. I was successful doing all the stuff (described in the link below) in dev_endpoint or in my virtual machine, but my goal is to have it AWS Glue Python Shell. Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

All the libraries in AWS Glue Python Shell must come as .whl or .egg packages - then they are installed. But AWS Glue is serverless and I wasn't able to find where they were installed - so that I could set up rpath correctly.

How to know absolute_path_to_library_dir?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
andrzejkuba
  • 1
  • 1
  • 1

1 Answers1

0

As glue is serverless, there is no /path/to/library/dir.

python processes in glue need a couple of things to connect to external databases such as your oracle server.

  1. python libraries must be packaged as an .egg or .whl, uploaded to s3, and the location of these files must be specified when creating a job (the field Python Library Path). this applies to any library that you author or which you would normally pip install but these are not available in the environment that aws provides for glue processes. So you'd need to build an .egg for cx_Oracle locally, upload to s3, and provide the path in Python Library Path when creating the your job. If you have already created the job, you can edit the job and provide the s3-path-to-cs-oracle.egg

  2. secrets such as connection credentials must be fetched from an secure external service by the etl script. One option is to store the oracle connection credentials in glue. From the aws glue console, go to connections, add a jdbc connection and save your database credentials.

  3. In your etl script, use boto3.client('glue').get_connection to retrieve the connection details, and using the user uploaded cx_Oracle library, connect to the database. Here's an example snippet that you would need to adapt & include in your etl script

snippet:

import boto3
import cx_Oracle as orcl

glue = boto3.client('glue')
resp = glue.get_connection(Name='my-oracle-connection')
props = resp['Connection']['ConnectionProperties']
dsn = props['JDBC_CONNECTION_URL'].split('//')[-1]
user = props['USERNAME']
pw = props['PASSWORD']
db = orcl.connect(user, pw, dsn)
#^ `db` should be a connection to your oracle db
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • 2
    I am getting this error: cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". libclntsh.so comes with oracle instant client package. As mentioned in https://stackoverflow.com/questions/57598027/connection-with-oracle-cx-oracle-problem-with-aws-glue-python-shell there should be LD_LIBRARY_PATH or rpath set up - but this cannot be done in AWS Glue Python Shell because as you mentioned glue is server-less. Is there any way around? – andrzejkuba Apr 17 '20 at 08:44
  • Please follow this answer https://stackoverflow.com/a/59467059/15235078 to produce a patched client and read my comments to this answer for instructions on how to set up your Glue job. I know this is still an actual problem for a lot of people who just stumbled upon this. I managed to fit the steps in a comment, but tell me if you'd like a separate answer with a step by step guide. – rhymefororange Feb 04 '22 at 17:29