File downloaded with wget has wrong encoding

Question

We have a page on our intranet which is just plain/text encoded in UTF-8 and it contais insert scripts for database. And problem is, when I download this page to a file with wget or curl, my downloaded file has UTF-8 encoding but special characters(Czech language chars) are broken.

So where could be a problem?I can convert it to CP-1250 encoding which works fine(I also tried ISO8859-2, it doesn't work either), but I can't use it for db insert via SQL*Plus, because our DB is encoded in UTF-8.

Thanks a lot for answers

I think your file is downloaded correctly. Nor the web server, nor wget, nor curl ever modify the files. So first check your file is okay on the web server and/or compare the checksum of the original file and the checksum of the downloaded file. — Gregory MOUSSAT, Mar 16 '12 at 11:31
How did you check that the file is broken? (Which program / editor?) — Andreas Florath, Mar 23 '12 at 17:31

score -1 · Answer 1 · answered Mar 16 '12 at 11:29

-1

The HTTP protocol communicates in US_ASCII.

Therefore, plain/text content CAN NOT be UTF-8: it must be encoded if it contains non-ASCII UTF-8 characters.

answered Mar 16 '12 at 11:29

adaptr

16,576
23
34

Sorry - but this is complete nonsense: please consult RFC 2616, Section 3.6. – Andreas Florath Mar 23 '12 at 17:29
Transfer-coding ? How does that apply ? – adaptr Mar 23 '12 at 17:57
_However, safe transport has a different focus for an 8bit-clean transfer protocol._ RFC 2616, Section 3.6 – Andreas Florath Mar 23 '12 at 18:01

File downloaded with wget has wrong encoding

1 Answers1