0

We have a page on our intranet which is just plain/text encoded in UTF-8 and it contais insert scripts for database. And problem is, when I download this page to a file with wget or curl, my downloaded file has UTF-8 encoding but special characters(Czech language chars) are broken.

So where could be a problem?I can convert it to CP-1250 encoding which works fine(I also tried ISO8859-2, it doesn't work either), but I can't use it for db insert via SQL*Plus, because our DB is encoded in UTF-8.

Thanks a lot for answers

Petr Mensik
  • 235
  • 1
  • 3
  • 11
  • I think your file is downloaded correctly. Nor the web server, nor wget, nor curl ever modify the files. So first check your file is okay on the web server and/or compare the checksum of the original file and the checksum of the downloaded file. – Gregory MOUSSAT Mar 16 '12 at 11:31
  • How did you check that the file is broken? (Which program / editor?) – Andreas Florath Mar 23 '12 at 17:31
  • 1
    @AndreasFlorath How did you solve this problem? – SJU Jun 17 '14 at 17:26

1 Answers1

-1

The HTTP protocol communicates in US_ASCII.

Therefore, plain/text content CAN NOT be UTF-8: it must be encoded if it contains non-ASCII UTF-8 characters.

adaptr
  • 16,576
  • 23
  • 34