Using PYODBC to execute query on Azure SQL in Databricks

Question

I'm trying to run some insert query in a Azure SQL server in a Databricks script using pyspark. I know there is the spark native connector, but ,for my understanding, the option given are just to append or overwrite the table. I need to update the table and not every row of it are affected by the update.

I tried using the python pyodbc library but when making the connection with

cnx = pyodbc.connect('DRIVER={};SERVER={};DATABASE={};Trusted_Connection=yes;user={},password={} '.format(driver,jdbcHostname_dev,jdbcDatabase_dev,sql_user,sql_password))

i receive the following error

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found (0) (SQLDriverConnect)")

I tried the answer marked as solution from using pyodbc in azure databrick for connecting with SQL server, but this not resolve the problem for me.

How can I use pyodbc in databricks?

You can use something like this: https://stackoverflow.com/questions/66670313/how-to-run-stored-procedure-on-sql-server-from-spark-databricks-jdbc-python/66799376#66799376 - then you don't need to install ODBC drivers, etc. — Alex Ott, Nov 24 '22 at 18:49
I will try this, but for the project deadline I will stick with the pyodbc approach, even if i don't like it very much. thanks for your support — Gabriele Sciurti, Nov 25 '22 at 08:47

B. B. Naga Sai Vamsi · Accepted Answer · 2022-11-25T06:12:05.733

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found (0) (SQLDriverConnect)")

The above error mainly happens because drivers are not properly installed.

I tried to reproduce the same in my environment and got the below results:

enter image description here

To resolve the above error. Please follow this approach:

Run the following command to install ODBC drivers on the Azure Databricks cluster.

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

enter image description here

Now, you can check Azure Databricks connected to SQL server.

Code:

import pyodbc
server = '<Your_server_name>'
database = '<database_name>'
username = '<username>'
password = '<password>'
    
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE='+ database +';UID=' + username + ';PWD='+ password)

enter image description here

This seems to work, in terms of getting the driver added to the list of pyodbc available ones, but what if one needs to use "SQL Server" instead of "ODBC Driver 17 for SQL Server"? I get a login timeout error when I try to connect with "ODBC Driver 17 for SQL Server", which is why I ask. — spareTimeCoder, Jan 04 '23 at 23:12

Using PYODBC to execute query on Azure SQL in Databricks

1 Answers1