2

I am using pyodbc in python 2.7 to insert some unicode chars to a nvarchar column in SQL Server. I am not successful. I am not very familiar with SQL Server.

My code:

import pyodbc
import string
import sys
import codecs
import os
import datetime
import time

def main(argv):

        target_table = 'staging.sf.Account_test'
        try:
                reload(sys)
                sys.setdefaultencoding('utf-8')

                # sql server connection
                sql_cnx = get_sql_server_connection()
                sql_cur=sql_cnx.cursor()
                sql_cur.execute('SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED')

                # read from file
                res_file_path = '/myfolder/test_double_byte_file.dat'

                # write to sql server
                write_data_to_sql_server(res_file_path, sql_cur, sql_cnx, target_table)

        except:
                e = sys.exc_info()
                print "exception: %s" %  e[1]
        finally:
                if 'sql_cnx' in locals():
                        sql_cnx.close()


def get_sql_server_connection():
        connstring="driver=freeTDS;server=server;database=staging;uid=user;pwd=pwddd;port=1433;TDS_Version=8.0;CHARSET=UTF8;"
        cnx=pyodbc.connect(connstring)
        cnx.autocommit=True
        return cnx

def write_data_to_sql_server(file_path, sql_cur, sql_cnx, target_table):
        try:
                f = codecs.open(file_path, 'r', encoding='utf-8')
                csvread = csv.reader(f, delimiter='^')
                lines=list(csvread)
                values=""
                numcols=len(lines[0])
                wildcard='?,'
                for i in range(numcols):
                        values = values + wildcard

                values = values.strip(',')
                insertstatement = "insert into "+target_table+" values("+values+")"
                print insertstatement

                sql_cur.execute("DELETE FROM " + target_table)
                #sql_cnx.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8', to=str)
                #sql_cnx.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8', to=unicode)
                #sql_cnx.setencoding(str, encoding='utf-8')
                #sql_cnx.setencoding(unicode, encoding='utf-8')


                print "nlines:- "+str(len(lines))
                #print "sleeping..."
                #time.sleep(60)

                batch = 20
                while len(lines) > 0 :
                        rowarr = []
                        for row in lines[:batch]:
                                print "line:-"+ str(row)
                                tup = ()
                                col_cnt = 0
                                for col in row:
                                        tup = tup + (str(row[col_cnt]).encode('utf-8'),)
                                        print str(row[col_cnt]).encode('utf-8')
                                        col_cnt = col_cnt + 1
                                rowarr.append(tup)
                        sql_cur.executemany(insertstatement, rowarr)
                        print 'commiting..'
                        sql_cnx.commit()
                        del lines[:batch]

        except:
                e = sys.exc_info()
                print "exception: %s" %  e[1]



if __name__ == "__main__": main(sys.argv[1:])

Data file - test_double_byte_file.dat

6-1-12, TANASHICHO^.^2017-08-09T21:02:26.000+0000
代々木2-28-7NTビル1階^株式会社ネクサスご担当者^2017-10-04T05:09:36.000+0000
北青山2-8-35^経理部^2017-08-09T21:03:12.000+0000
渋谷2-12-12 三貴ビル8F^経理部^2017-08-06T16:09:34.000+0000
南青山2-11-16 METLIFE青山ビル^経理部^2017-08-31T01:01:10.000+0000
3-2 TRⅡビル4F^茂代 髙月^2017-08-06T16:09:34.000+0000
中区丸の内3-21−21丸の内東桜ビル805^経理部^2017-08-19T04:01:41.000+0000
神宮前1-3-12ジブラルタ生命原宿ビル2F^UNKNOWN^2017-08-09T21:03:16.000+0000
銀座7-17-14松岡銀七ビル 5F^経理部^2017-09-25T10:20:47.000+0000
港南1-9-1NTT品川TWINSビル^経理部^2017-08-09T21:03:16.000+0000
日本橋本町1-9-4ヒューリック日本橋本町1町名ビル8階^経理部^2017-08-09T21:03:16.000+0000

Error:

exception: ('HY000', '[HY000] [FreeTDS][SQL Server]Unknown error (0) (SQLExecDirectW)')

Not sure what am I missing...

The first line in the data file which doesn't contain unicode chars can be inserted with out errors. But other lines are throwing the exception.

Any help here is appreciated.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
PraveenK
  • 163
  • 2
  • 11

1 Answers1

2

Your code returns the text values from the data file as str objects containing UTF-8-encoded bytes ...

f = codecs.open(file_path, 'r', encoding='utf-8')
csvread = csv.reader(f, delimiter='^')
lines=list(csvread)
item = lines[1][0]
print(repr(item))
# '\xe4\xbb\xa3\xe3\x80\x85\xe6\x9c\xa82-28-7NT\xe3\x83\x93\xe3\x83\xab1\xe9\x9a\x8e'

... and calling encode on them has no effect:

encoded_item = item.encode('utf-8')
print(repr(encoded_item))
# '\xe4\xbb\xa3\xe3\x80\x85\xe6\x9c\xa82-28-7NT\xe3\x83\x93\xe3\x83\xab1\xe9\x9a\x8e'

What you really want to do is decode them into proper Unicode objects ...

decoded_item = item.decode('utf-8')
print(repr(decoded_item))
# u'\u4ee3\u3005\u67282-28-7NT\u30d3\u30eb1\u968e'

... and pass those Unicode objects as the parameter values. That's because SQL Server does not use UTF-8 encoding – it uses UTF-16LE – and pyodbc can take a Unicode object (which is independent of any particular encoding) and pass it to the ODBC driver in an acceptable format.

BTW, changing the default encoding via ...

reload(sys)
sys.setdefaultencoding('utf-8')

... is considered risky because it can break things that rely on the actual default encoding for Python2 ('ascii'). It would be better to read Unicode "CSV" files using a method that

  1. doesn't hack the default encoding for Python2, and
  2. returns proper Unicode objects.

There are examples in the csv documentation for Python2.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418