4

I have data which is unicode and wish to write it to a file. I am using python 2.6. I am able to print the encoded values but am not able to write it to a file. The default encoding for the environment is UTF-8. Tried using codecs as well, but no luck there too. Here is a sample code snippet that I am using.

#!/usr/bin/python
import sys
import codecs
import csv

sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']
print sys.stdout.encoding
f = codecs.open('listwrite.txt', 'w', encoding='latin-1')
for item in sh:
  f.write(item)
f.close()

for i in sh:
  print i.encode('latin-1')
#

Output:

UTF-8
Télévista S.A.
Télévista S.A.
Python

Contents of listwrite.txt
Télévista S.A.Télévista S.A.Python
#

As seen above the file is being written in UTF-8 encoding and not Latin-1. How do I change it and override the default encoding for the file.

Edit: 2

Also, writing using a csv writer gives UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Code below:

#!/usr/bin/python
import sys
import codecs
import csv

sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']
print sys.stdout.encoding
c = csv.writer(codecs.open('listwrite.txt', 'w', encoding='latin-1'), quoting=csv.QUOTE_NONE)
c.writerow(sh)
f.close()

for i in sh:
  print i.encode('latin-1')
user1827064
  • 63
  • 2
  • 6
  • 3
    I cannot reproduce this with Python 2.6. For me, `listwrite.txt` contains Latin-1 encoded data when I run your example code. How are you verifying the contents? – Martijn Pieters Nov 15 '12 at 15:30
  • I know one way is to change the default encoding in site.py. I don't want to use that road. Is any other workaround possible to just write to the file using latin-1 on the fly? It would be very helpful. – user1827064 Nov 15 '12 at 15:36
  • Just doing a cat or seeing it in vi! – user1827064 Nov 15 '12 at 15:36
  • Sorry, doing a cat on the file has latin-1 but opening using vi has UTF-8. I am confused now. – user1827064 Nov 15 '12 at 15:38
  • Try looking at it outside of a terminal (any text editor, etc.) - depending on your setup, you will get a different result when viewing output in the terminal as opposed to a non-terminal text editor, etc. – RocketDonkey Nov 15 '12 at 15:38
  • No, the default encoding is site.py only applies to automatic conversions, but when using `codecs.open` you specify an explicit encoding. Besides, my default is still ASCII (and it should stay that way). – Martijn Pieters Nov 15 '12 at 15:38
  • @RocketDonkey is correct. When I open it in some other editor, I can see latin encoding. I am wondering why vi shows UTF-8. Anyways I think this should work for me. I had read in a few posts that the default should be ASCII, but mine is UTF-8. Not sure how or who changed it. – user1827064 Nov 15 '12 at 15:44
  • Is your output reasonable? If `sys.stdout.encoding` is `UTF-8`, how can you get the print result of `latin-1` encoded characters properly? – Reorx Nov 15 '12 at 16:20
  • @Reorx: Sorry, did not get you. Yes, sys.stdout.encoding is UTF-8. But am able to print latin-1 using encoding on standard output. – user1827064 Nov 15 '12 at 17:06
  • @MartijnPieters: Can you guys please look at Edit 2 and let me know why it fails? – user1827064 Nov 15 '12 at 18:33
  • @user1827064: When using the `csv` module, you need to encode to bytes yourself. – Martijn Pieters Nov 15 '12 at 18:50
  • closed? are all python 2.x questions being closed now? or is latin1 a to small geographical region? what am I missing, please help me understand. – SHernandez Feb 05 '15 at 22:49

1 Answers1

3

I think you're attacking the problem from a wrong angle. Try encoding each row before writing instead:

import csv
sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']

f = open('listwrite.txt', 'wb') # binary mode
writer = csv.writer(f)
writer.writerow([item.encode('latin-1') for item in sh])
f.close()

Now you have a proper latin1-encoded file:

$ cat listwrite.txt | iconv -f latin1
Télévista S.A.,Télévista S.A.,Python
$ file listwrite.txt 
listwrite.txt: ISO-8859 text, with CRLF line terminators
9000
  • 39,899
  • 9
  • 66
  • 104