3

I am working on AWS Glue Python Shell. I want to connect python shell with Oracle. I am successful installing psycopg2 and mysql libraries but when I tried to connect Oracle using cx_Oracle, I have successfully installed the library but I am facing the error

DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory"

I have tried following things

  1. I have downloaded so files from S3 and placed it in lib folder in parallel to the code file

  2. I have set the LD_LIBRARY_PATH, ORACLE_HOME using os.environ

I am using following code

import boto3
import os
import sys
import site
from setuptools.command import easy_install

s3 = boto3.client('s3')
dir_path = os.path.dirname(os.path.realpath(__file__))
#os.path.dirname(sys.modules['__main__'].__file__)

install_path = os.environ['GLUE_INSTALLATION']
easy_install.main( ["--install-dir", install_path, "cx_Oracle"] )

importlib.reload(site)

import cx_Oracle

conn_str = u'{username}/{password}@{host}:{port}/{sid}'
conn = cx_Oracle.connect(conn_str)
c = conn.cursor()
c.execute(u'select * from hr.countries')
for row in c:
    print(row[0], "-", row[1])
conn.close()
print('hello I am here');

I should be able to connect with oracle on aws glue python shell

Zohaib Ijaz
  • 21,926
  • 7
  • 38
  • 60
Nouman Khalid
  • 68
  • 1
  • 6
  • 1
    LD_LIBRARY_PATH needs to be set before the process starts. If you're using Oracle Instant Client, then don't set ORACLE_HOME. Check the instructions on whichever OS download page you got Instant Client from https://www.oracle.com/database/technologies/instant-client/downloads.html – Christopher Jones Aug 21 '19 at 22:29
  • I have tried to set LD_LIBRARY_PATH using os.environ['LD_LIBRARY_PATH'] = 'path/so/files' but not working on aws glue python shell – Nouman Khalid Aug 22 '19 at 15:16
  • `os.environ()` is evaluated after the process starts, isn't it? `LD_LIBRARY_PATH` needs to be set before it starts. – Christopher Jones Aug 23 '19 at 04:05

1 Answers1

5

As it has already been mentioned in the responses. LD_LIBRARY_PATH needs to be set before the script starts. So, a way to avoid using LD_LIBRARY_PATH is setting rpath in the so file. Below are the steps needed.

You will need to update rpath in your so file. This can be done using patchelf package.

Please also include your libaio.so.1 in your so files which you might have generated by running sudo apt-get install libaio1

Installing patchelf

sudo apt-get update sudo apt-get install patchelf

To update rpath to your lib directory

patchelf --set-rpath <absolute_path_to_library_dir> libclntsh.so

Upload the so files with updated rpath to your glue env lib directory.

In your script you can then load the library.

from ctypes import *
cdll.LoadLibrary('<absolute_path_to_library_dir>/libclntsh.so')
Harjeet Singh
  • 51
  • 1
  • 2
  • Kudos to you, man. It took me a long time and a lot of frustration, but on the 68th try (when I found this answer), it finally worked. For me it works even without `cdll.LoadLibrary` stuff. Probably because `cx-Oracle` does that itself. Anyway, thanks a lot for this answer. – rhymefororange Aug 26 '21 at 19:21
  • 1
    Also, whatever loads the libraries in AWS Glue runtime appears not to follow symlinks. This is an issue, because the Oracle Instant Client package contains, `libclntsh.so`, which is a symlink to `libclntsh.so.21.1`. I had to copy `libclntsh.so.21.1` and put it in place of `libclntsh.so`, because I got `cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: /libclntsh.so: file too short"`. The same goes for `libaio.so.1`. When I built it from source, the library was in `libaio.so.1.0.1`, and `libaio.so.1` was a symlink, so I had to copy it as well. – rhymefororange Aug 26 '21 at 19:31
  • Can you please let me know the steps you followed? I have added the libclntsh.so.21.1 file to Referenced files path but to no avail. How do I edit rpath - I am on windows – Ankit Goel Sep 02 '21 at 10:14
  • You have to rename libclntsh.so.21.1 to libclntsh.so, because the linking depends on that name, I guess. And regarding editing rpath on windows, I don't know. I edited on linux – rhymefororange Oct 08 '21 at 15:34
  • I get an error saying `Missing ELF header` when I try to do `patchelf --set-rpath libclntsh.so` – Pavan Andhukuri Oct 29 '21 at 19:10
  • what should be the absolute path to library directory? In glue it keep changing – SKB Nov 17 '21 at 18:45
  • 1
    I've upload libclntsh.so to S3 and configured referenced file path in aws glue job. During run time I see the file get download to /tmp/glue-python-scripts-f18bxst8 which get changed for every job run. Could you please let me know, - absolute path - should I need to upload the oracle client zip file which has all libraries or just libaio.so.1 and libclntsh.so to S3 – SKB Nov 17 '21 at 18:52
  • The updated `(rpath)` so files need to be part of the dist egg package of your python code and `__init__.py ` file should have below line of code for your custom package. `os.environ['PYTHON_EGG_CACHE'] = "/tmp"` Then you will be able to use the dynamic load of the libraries. `cdll.LoadLibrary("/tmp/libclntsh.so");` – Harjeet Singh Dec 17 '21 at 08:06
  • @rhymefororange Where are you adding so files. Is it S3? I tried to update the rpath with S3 location but still not working for me. Could you please share the steps? – sujays Feb 03 '22 at 13:58
  • 1
    @sujays Yes, the referenced files are going to be available in `/tmp/` on each run. What I did is: 1. When patching rpath, set it to a directory that will be visible in glue job runtime 2. Put the archive with patched libraries in S3 and put the s3 URI to it in Referenced files path in Glue job config 3. In glue job code, list every directory in `/tmp` until you find the one with your archive 4. Unpack that archive to the same dir you set in rpath 5. Call `cx_Oracle.init_oracle_client(lib_dir=f'{dir_name}/{archive_name}/{nested_dir}')` (lib_dir-dir with .so files) 6. Done – rhymefororange Feb 04 '22 at 17:06
  • @rhymefororange : "When patching rpath, set it to a directory that will be visible in glue job runtime". For me only /tmp location is accessible as I can see the files from S3 in referenced files is present in /tmp folder. Shall I set the rpath as /tmp. I dont think we have permission to use other directories – sujays Feb 09 '22 at 14:36
  • 1
    @sujays For example, if you downloaded oracle instant client libraries, patched and zipped the whole folder, your archive is named “instant-client-basic-linux.x64-21.3.0.0.0” and it contains a folder named “instantclient_21_3” with all the libraries. Then, when you iterate through random dirs in `/tmp`, you will find an archive with the name above and extract it to a static directory `/tmp/libs`. So then, your `rpath` would be `/tmp/libs/instant-client-basic-linux.x64-21.3.0.0.0/instantclient_21_3`, because that in the absolute path to client libraries. – rhymefororange Feb 10 '22 at 15:18
  • @rhymefororange Thanks. we able to connect oracle database with the steps you provided. I guess this is the only link which has proper information to use cx_oracle in glue. Thanks once again – sujays Feb 11 '22 at 07:05
  • @sujays glad it helped. I can post a separate answer here with step by step guide to make it a bit clearer if you'd like. – rhymefororange Feb 11 '22 at 14:14
  • @rhymefororange That would be much better. I had already created a question few days ago. You can answer it. https://stackoverflow.com/questions/70972408/unable-to-connect-oracle-database-using-cx-oracle-from-aws-glue?noredirect=1#comment125501425_70972408 – sujays Feb 11 '22 at 18:06
  • @sujays Done. Here is the answer https://stackoverflow.com/a/71095478/15235078 – rhymefororange Feb 12 '22 at 20:11