AWS Glue Spark job error: "ModuleNotFoundError: You need to install pyodbc respectively the AWS Data Wrangler package with the sqlserver"

Question

I am using AWS Glue Spark with python job to sync the data from s3 to on-prem Sql Server and using AWS Wrangler and attached pyodbc wheel file along with it. when I ran my job I am getting this error "ModuleNotFoundError: You need to install pyodbc respectively the AWS Data Wrangler package with the sqlserver extra for using the sqlserver module".

Need help on how to install the pyodbc respectively with AWS Data Wrangler package.

Another way I tried to import the pyodbc is from importlib.util.find_spec("pyodbc") but getting same error as well.

import sys

from awsglue.transforms import *

from awsglue.utils import getResolvedOptions

from pyspark.context import SparkContext

from awsglue.context import GlueContext

from awsglue.job import Job

import importlib

import awswrangler as wr

from awswrangler import _data_types

from awswrangler import _databases as _db_utils

from awswrangler import exceptions

from awswrangler import sqlserver


__all__ = ["connect", "read_sql_query", "read_sql_table", "to_sql"]

_pyodbc_found = importlib.util.find_spec("pyodbc")
if _pyodbc_found:
    import pyodbc 

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()

glueContext = GlueContext(sc)

spark = glueContext.spark_session

job = Job(glueContext)

job.init(args['JOB_NAME'], args)

con = wr.sqlserver.connect(connection="glue_to_onprem_test_1", jdbc_driver_version=3.0)

df = wr.sqlserver.read_sql_query(sql="SELECT TOP 10 * FROM dbo.GlueDataSync", con=con)
con.close()

score 0 · Answer 1 · edited Jul 12 '21 at 00:53

Notes for Microsoft SQL Server

awswrangler is using the pyodbc for interacting with Microsoft SQL Server. For installing this package you need the ODBC header files, which can be installed, for example, with the following commands:

sudo apt install unixodbc-dev
yum install unixODBC-devel

After installing these header files you can either just install pyodbc or awswrangler with the sqlserver extra, which will also install pyodbc:

pip install pyodbc
pip install awswrangler[sqlserver]

Finally you also need the correct ODBC Driver for SQL Server. You can have a look at the documentation from Microsoft to see how they can be installed in your environment.

If you want to connect to Microsoft SQL Server from AWS Lambda, you can build a separate Layer including the needed OBDC drivers and pyobdc.

If you maintain your own environment, you need to take care of the above steps. Because of this limitation usage in combination with Glue jobs is limited and you need to rely on the provided functionality inside Glue itself.

Source: https://aws-data-wrangler.readthedocs.io/en/stable/install.html

AWS Glue Spark job error: "ModuleNotFoundError: You need to install pyodbc respectively the AWS Data Wrangler package with the sqlserver"

1 Answers1