0

I am working on moving text from a csv file to oracle database . I have built the python script for this. There is a field in csv which is in spanish . I know there are 100's of article in stack overflow . I have been breaking my head for last 4 hours unable to fix the issue

Sample text : LATAM PMR - MS Diálisis

I used Chardect to detect the language it said {'confidence': 0.99, 'encoding': 'TIS-620'}

So I went a head and updated my default character encoding in python to TIS-620 as suggested in one of the posts

import sys stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('TIS-620')

But still I am getting the output as

'LATAM PMR - MS Di\xc3\xa1lisis'

I tried with cp1252,latin-1 nothing works I keep getting it as above as

a=LATAM PMR - MS Diálisis
a.encode('cp1252')
a.encode('latin-1)

Can you please help me debug this issue

I want the text as LATAM PMR - MS Diálisis to be loaded into the oracle database

gowtham Y.R
  • 111
  • 1
  • 2
  • 10
  • There are many places where this could go wrong. First, you have to be sure that you are reading the CSV correctly. I wouldn't use defaultencoding. Read the file and convert it to unicode instead: `data = file_contents.decode(encoding)` Then make sure that it the unicode is correct. Character á should be `u'\xe1'`. Once you have that figured out, then think about writing it to Oracle- You'l probably have to configure the database, the drivers and your Python code to make that work. – zvone Sep 09 '16 at 23:52
  • Have you watched [Pragmatic Unicode, or, How do I stop the pain?](https://www.youtube.com/watch?v=sgHbC6udIqc) – wwii Sep 10 '16 at 01:10
  • Do you think that Latin or Spanish text has **anything** to do with **Thai** Industrial Standard `TIS-620` encoding? What is hexadecimal value of `iál` substring in your `csv` file? (Use any hexadecimal viewer/editor). – JosefZ Sep 11 '16 at 20:37

0 Answers0