0

I have two sites I'm developing (in PHP). They are using identical code to provide an XLS export (using PEAR excel) and they are running on the same local server. To rule out a problem with the actual data in the xls, I am just outputting a file with no data for now.

When I export from site A and save the file it's reported as 'ANSI' encoded within Notepad++. This file opens correctly in Excel.

When I export from site B, the file is reported as 'UTF-8' encoded, the file won't open in Excel. If I convert the file to ANSI or UTF-8 without BOM in Textpad++, it opens just fine in Excel.

The same encoding difference is present between site A and B when I save an arbitrary page on the site, so I think it may be more fundamental than just how the Excel file is being generated (same encoding when exporting CSV/ODS formats). I've compared the http headers between site A and B during the export, they are functionally identical. Explicitly adding Charset=ISO-8859-1 to the header makes no difference. The apache virtual hosts are also functionally identical between sites. Both sites are using identical character encodings in their databases (but since I'm not exporting any data right now, this is irrelevant).

What else could be causing this which I haven't accounted for?

Thanks!

UPDATE

The excel generation is a red herring, I've removed all of that and simply outputting the download header and a test string. When saved, the file is still encoded differently between sites. The code which generates the download file seems identical when I diff the various files...

I haven't been able to repeat the problem by creating a simplified test case. When I tried, both sites output files which are saved as ANSI - I don't understand what else could be going on.

Frank D
  • 1
  • 1
  • 2
  • 1
    What do you see in a hex editor? – SLaks May 22 '12 at 14:33
  • assuming the content being used to create the file is identical byte for byte, and that the webserver is not changing the encoding on the fly, it has to be done by the excel generation library. I would start greping teh source codez. – goat May 22 '12 at 14:50
  • File bytes are different in hex editor. I can't see why webserver would change encoding as it's the same server with the same vhost setup. Code seems identical when I diff... I wonder if some php source files are saved with different encodings but not found any yet... – Frank D May 22 '12 at 16:31

2 Answers2

0

the ANSI "mode" just uses the language table you have on your system to save data; you cannot be sure the saved document will be visible to others.

the UTF-8 without BOM means utf8 but without appending some strange utf characters (2 or 3 i think at the top of file), probably causing excel a headache.

Im going always with without bom approach if im thinking i18n

  • ANSI uses the system _codepage_ it will only be readable if the other end uses the same codepage. – SLaks May 22 '12 at 14:28
  • Yes this is not very helpful. Site A saves files from the browser in 'ANSI' (Windows-1252) which is what I want. Site B (identical code, same server) is saving files in UTF-8 which don't work when they are Excel files. – Frank D May 22 '12 at 14:44
  • Do the default settings on the two sites differ? Maybe one has "Always save as UTF-8" set on and the other hasn't. – rossum May 22 '12 at 14:57
  • @rossum Where are these settings defined? (the problem is happening in all browsers) – Frank D May 22 '12 at 15:15
  • @Frank D: I don't know where the settings are defined. Something is different between your two sites, otherwise they would produce the same output. You will have to read the manual, "The horror! The horror!" :) – rossum May 22 '12 at 15:24
0

Thanks for all your input into this, it's much appreciated. In the end I tracked it down, a PHP source file was being included somewhere along the way which was encoded UTF-8 rather than ANSI (Windows-1252). I don't really understand why this causes a problem though, since that PHP include doesn't output anything. Very weird and very frustrating, I hope maybe someone else finds my pain useful.

Frank D
  • 1
  • 1
  • 2