ODBC for SQL Server in Python

Question

I have requirement to extract data from SQL Server and create a .csv file from numerous tables. So I created a python script to do this activity which uses pyodbc/turbodbc connection with SQL Server ODBC Drivers. It works fine sometimes however it disconnects when it finds large table (over 11M) and performance wise it is very slow. I tried freeTDS, but looks the same as pyodbc interns of performance.

This is my connection:

pyodbc.connect(Driver='/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.5.so.2.1',server=systemname,UID=user_name,PWD=pwd)

def connect_to_SQL_Server(logins):
        '''Connects to SQL Server.

        Returns connection object or None
        '''
        con = None

        try:
            hostname = logins['hostname']
            username = logins['sql_username']
            password = logins['snow_password']

            #con = turbodbc.connect(Driver='/usr/lib64/libtdsodbc.so',server=hostname,UID=username,PWD=password,TDS_Version=8.0)
            #con = turbodbc.connect(Driver='/usr/lib64/libtdsodbc.so',server=hostname,UID=username,PWD=password,TDS_Version=8.0)
            #con = pyodbc.connect(Driver='/usr/lib64/libtdsodbc.so',server=hostname,UID=username,PWD=password,TDS_Version=8.0,Trace='Yes',ForceTrace='Yes',TraceFile='/maxbill_mvp_data/all_data/sql.log')
            con = pyodbc.connect(Driver='/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.5.so.2.1',server=hostname,UID=username,PWD=password)
            #con = turbodbc.connect(Driver='/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.5.so.2.1',server=hostname,UID=username,PWD=password)
            #con = pyodbc.connect(DSN='MSSQLDEV',server=hostname,UID=username,PWD=password)

            return con

        except (pyodbc.ProgrammingError, Exception) as error:
            logging.critical(error)
sqlCon = connect_to_SQL_Server(logins)
sql = 'select * from table'
i = 0
for  partial_df in(pd.read_sql(sql, sqlCon, chunksize=300000)):
               #chunk.to_csv(f+'_'+str(i)+'.csv',index = False,header = False,sep = ',',mode = 'a+')
     partial_df.to_csv(filenamewithpath + '_'+str(i)+'.csv.gz', compression='gzip', index=False, sep='\01', header= False, mode='a+')
     i+=1

Are there any parameters I can try with for performance improvement. Just to let you know these python scripts running from different server than SQL Server hosted server and which is Linux cloud instance

Can you share some code that reproduces the failure and quantify the slow performance? — David Browne - Microsoft, Jul 22 '20 at 22:41
Thanks for responses, i have updated the post with code snippets — Data Engineer, Jul 22 '20 at 23:13
Can bcp.exe generate multiple files with certain no.of rows. better than pyodbc as i have seen few issues when did couples of years back.? — Data Engineer, Jul 22 '20 at 23:24
@DataEngineer - why not use BCP to export the data, then use Python to chunk the file up? — gvee, Jul 26 '20 at 10:59
Yes, I tried but could not be successful. Code: import bcp con = bcp.Connection(driver='ODBC Driver 18 for SQL Server', host='hostname', username='username', password='password') my_bcp = bcp.BCP(con) file = bcp.DataFile(file_path='/storage/folder', delimiter=',') my_bcp.dump(query='select * from tablename', output_file=file) — Data Engineer, Jul 28 '20 at 11:56
Error: Traceback (most recent call last): File "bcp_test.py", line 4, in con = bcp.Connection(driver='ODBC Driver 18 for SQL Server', host='hostname', username='username', password='password') File "/home/ec2-user/.local/lib/python3.7/site-packages/bcp/connections.py", line 81, in __init__ self.driver = driver File "/home/ec2-user/.local/lib/python3.7/site-packages/bcp/connections.py", line 90, in driver raise DriverNotSupportedException bcp.exceptions.DriverNotSupportedException — Data Engineer, Jul 28 '20 at 11:59

ODBC for SQL Server in Python

0 Answers0