0

Hi i write script prasing text from web and my otput file must be in Non-ISO extended-ASCII text, with CRLF, LF line terminators How i can write file as this codepage ?.

def save_file(potoczek,  nazwapliku):
    file = open(nazwapliku,"w")
    file.write(potoczek)
    file.close()
        return()
#...
t1='"'+tab[0]+'\"\n'+naglowek+wykli(zawartosc0).encode('latin2')
t2='"'+tab[1]+'\"\n'+naglowek+wykli(zawartosc1).encode('latin2')
t3='"'+tab[2]+'\"\n'+naglowek+wykli(zawartosc2).encode('latin2')
t4='"'+tab[3]+'\"\n'+naglowek+wykli(zawartosc3).encode('latin2')

TRAKTOR = t1+t2+t3+t4

print udata
save_file(TRAKTOR,PLIK_SCIEZKA)
Dzaczek
  • 46
  • 1
  • 8
  • "Non-ISO extended-ASCII" doesn't really specify any specific encoding. What do you mean? "latin2" is an alias for [ISO-8859-2](http://en.wikipedia.org/wiki/ISO_8859-2), so it's definitely not "Non-ISO". – Joachim Sauer Sep 04 '12 at 06:39
  • Old file check in command 'file' bash console : `file ~/stare_ceny/jacek/stare_ceny/2012-07-02.csv /home/jacek/stare_ceny/2012-07-02.csv: Non-ISO extended-ASCII text, with CRLF, LF line terminators` latin 2 is alias for ISO-8859-2 New file ` jacek@R2D2:~/skrypty$ file /var/usterki/ceny_2012-09-04_08-11-46.csv /var/usterki/ceny_2012-09-04_08-11-46.csv: UTF-8 Unicode text ` – Dzaczek Sep 04 '12 at 06:47

1 Answers1

1

The general rule in this case is "Use Unicode internally; encode at the I/O boundaries" -- you could easily extend that philosophy to the line endings as well.

import io # for Python 2.6+; not needed in Python 3

def save_file(potoczek,  nazwapliku): # data, filename
    file = io.open(nazwapliku, mode="w", newline="\r\n") # in Python 3, just "open"
    file.write(potoczek.encode('latin2'))
    file.close()
    return()

Then just build up your data as regular unicode strings, separated by standard newlines (\n), and the save_file function will take care of the required translation.

Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
  • I have this inforamtin :/ `Traceback (most recent call last): File "skrypt.py", line 182, in save_file(udata,PLIK_SCIEZKA) File "skrypt.py", line 108, in save_file file.write(potoczek.encode('latin2')) File "/usr/lib/python2.6/encodings/iso8859_2.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 98: ordinal not in range(128) ` – Dzaczek Sep 04 '12 at 06:59