0

I have been trying to import an external python libraries in aws glue python shell job.

  1. I have uploaded the whl file for Pyodbc in s3.
  2. I referenced the s3 path in "python library path" in additional properties of Glue job.
  3. I also tried to give job parameter --extra-py-files with value as s3 path of whl file.
  4. whenever I write the line "from pyodbc import pyodbc as db"or just "import pyodbc" it always returns "ModuleNotFoundError: No module named 'pyodbc'"
  5. Logs are shown as below:

Processing ./glue-python-libs-cq4p0rs8/pyodbc-4.0.32-cp310-cp310-win_amd64.whl Installing collected packages: pyodbc Successfully installed pyodbc-4.0.32

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

File "/tmp/glue-python-scripts-g_mt5xzp/Glue-ETL-Dev.py", line 2, in ModuleNotFoundError: No module named 'pyodbc'

I am downloading the wheel files from here :https://pypi.org/project/pyodbc/#files

No matter how many versions of whl files I refer in the glue job, it always throws the same error.

can anyone enlighten me where it's going wrong?

  • Is it because I am using a wrong whl file? If so, where can I find the relevant whl file? – srikanth sanapathi May 16 '22 at 10:51
  • I think your suspicions are correct. [Pypi only has pyodbc whl files for win and mac](https://pypi.org/project/pyodbc/#description). And since pyodbc has c+ extensions, you'll have to build one on a linux platform. Here's a [link](https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/) that might help – Bob Haffner May 17 '22 at 02:28
  • The issue resolved after using proper linux wheel file – srikanth sanapathi May 31 '22 at 10:50
  • Did you find one or did you have to build it? – Bob Haffner May 31 '22 at 14:54
  • I found it in https://pypi.org/project/pymssql/#files – srikanth sanapathi Jun 01 '22 at 05:58
  • @BobHaffner I am now trying to switch to pyodbc since pymssql's executemany is too slow in loading data. But I am not able to generate linux wheel file for pyodbc as suggested here: https://aws.amazon.com/premiumsupport/knowledge-center/glue-job-use-external-python-libraries/ error is that It can't find PIP. – srikanth sanapathi Jun 29 '22 at 11:14

1 Answers1

0

I have tried to follow these guides [1], [2] in the official documentation of AWS, but I was facing some issues when importing some libraries, such as psycopg2. Finally, I managed to import the desired libraries by following the steps of this tutorial from AWS blog [3]. The blog is in Spanish, but maybe you can manage to translate it.

Basically what they do is create a setup.py script on which they define the required libraries. Afterwards, they generate a .whl file with those libraries and they upload that file to a s3 bucket from which the Glue Python Shell script gets the required libraries.

[1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job

[2] https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library

[3] https://aws.amazon.com/es/blogs/aws-spanish/usando-python-shell-y-pandas-en-aws-glue-para-procesar-conjuntos-de-datos-pequenos-y-medianos/

kevin
  • 988
  • 12
  • 23
  • Yeah I have tried this. But, it doesn't work for certain libraries like PYODBC , SQLALCHEMY – srikanth sanapathi Jul 08 '22 at 05:05
  • @srikanthsanapathi Have you figured this out I am getting the same error. Is it possible to just use the --extra-files to import just a utility .py file for stuff or does everything need to be compressed? I was able to load a json config file just fine but not a .py file. – Wolfy Oct 03 '22 at 23:21
  • @Wolfy I didn't find the need to import a .py file . – srikanth sanapathi Oct 04 '22 at 09:22