1

I have the following dataframe in pandas

enter image description here

need to insert all value into a datawarehouse with chinese characters but chinese characters are instered as junk (?????) (百å¨è‹±åšï¼ˆèˆŸå±±ï¼‰å•¤é…’有é™å…¬å¸ ) like above one The insert query is prepared dynamically. I need help on how to handle the following scenerio:

Read file as UTF-8 and writte into a datawarehouse using pyodbc connection using character set UTF-8.

df=pd.read_csv(filename,dtype='str',encoding='UTF-8')
cnxn = database_connect() ##Connect to database##
cnxn.setencoding(ctype=pyodbc.SQL_CHAR, encoding='UTF-8')
cnxn.autocommit = True
cursor = cnxn.cursor()
for y in range(len(df)):
 inst='insert into '+tablename+' values ('
 for x in range(len(clm)):
  if str(df.iloc[y,x])=='nan':
   df.iloc[y,x]=''
  if x!=len(clm)-1:
   inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+","
  else:
   inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+")"
 inst=inst+inst_val #########prepare insert statment from values inside in-memory data###########
 inst_val=''
 print("Inserting value into table")
 try:
  cursor.execute(inst) ##########Execute insert statement##############
  print("1 row inserted")
 except Exception as e:
  print (inst)
  print (e)

same like value should inserted into sql datawarehouse

jottbe
  • 4,228
  • 1
  • 15
  • 31
  • Have you already checked, how your data looks like after reading it with pandas? Is it displayed correctly? Are you sure, your source is UTF8? AFAIK in asian languages often other encodings are used, like UTF-16, because each of the characters (except for the usual arabic digits and characters like spaces) needs more than one byte, so UTF-8 isn't really a perfect representation for such languages and UTF-16 or higher would be a better match. – jottbe Aug 30 '19 at 06:51
  • Maybe you can verify that it is indeed UTF-8, because if it is UTF-16 or higher it could explain, why you get this character-mess. – jottbe Aug 30 '19 at 06:53
  • ya i have printed after loading into dataframe its having chinese character,i tried with UTF-16 but while loading into SQL server its trowing error – KARTHI KEYAN Aug 30 '19 at 07:00
  • Sorry, I think I can't help you. If the characters are displayed correctly after reading them with `UTF-8` I guess it is indeed `UTF-8`. You could maybe only check what happens if you write a portion of your SQL script to a file and run it on SQL Server over other tools. If that also doesn't work (with and without BOM), I would suppose there is a bug in SQL Server. But if it works, probably something with the db connection settings is wrong (or a bug in pyodbc). – jottbe Aug 30 '19 at 08:08
  • ok let me check and get back,thanks for your response – KARTHI KEYAN Aug 30 '19 at 08:20

1 Answers1

1

You are using dynamic SQL to construct string literals containing Chinese characters, but you are creating them as

insert into tablename values ('你好')

when SQL Server expects Unicode string literals to be of the form

insert into tablename values (N'你好')

You would be better off to use a proper parameterized query to avoid such issues:

sql = "insert into tablename values (?)"
params = ('你好',)
cursor.execute(sql, params)
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418