1

I post this after several hours of research (several times...). I couldn't find any answer yet.

My goal is to write a CSV file using PHP. This file has to have the Chinese ANSI encoding (I suppose it's GB2312 for simplified Chinese, in notepad++ I only see ANSI as encoding). It's a must to import to another tool.

[Important note]

We are currently converting a file with notepad++ and a PC that has Chinese as default language. The process is:

  • get the UTF8 CSV from the web-app
  • save as csv with Excel 2003 on the Chinese PC
  • open in notepad++, the encoding is ANSI already, delete one leading "?" at the beginning of the file.

I ran a test: change my .csv file into a .php and replace it by the following code to keep the same encoding:

<?php echo mb_detect_encoding("test"); ?>

This will print: "ASCII".

Then I am not sure what should be the output of my CSV: GB2312?, ASCII?, ANSI?. I am not even clear on the difference between those.

I also read that a file saved with Excel 2007 as CSV with Chinese PC is OK for this tool.

[/Important Note]

Currently, I don't manage to get it right! When I open the file I get in notepad++, it still shows encoding as being encoded in UTF-8. And it's obvious because the Chinese characters look nice, they should look "broken" :-).

I am using the following header conditions:

header("Content-type: text/csv; charset=GB2312");
header("Content-Disposition: attachment; filename=$filename.csv");
header("Content-Transfer-Encoding: binary"); 
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Pragma: no-cache");
header("Expires: 0");

[Additional information]

The way my file is coded is (I made it abstract to keep it easy)

//header, hard coded in Chinese
$csv = "东西,东西,东西\n"; //example "stuff,stuff,stuff"
[...]
//write line by line, status is also hard coded (行)
$csv .= $DB_data_1.",".$DB_data_2.",行\n"; //行=OK

[/Additional information]

I also convert my CSV string into GB2312 with iconv before printing it (also tried mb_convert_encoding)

setlocale(LC_ALL,'zh_CN');
$csv = iconv("UTF-8","GB2312",$csv);
echo($csv);

My .php file is written in UTF-8 encoding (not UTF-8 without BOM)

Basically, I always get UTF-8 file as output, I need ANSI. It looks like there are so many parameters/attributes and I don't get it right. Your help would be appreciated!

Thanks!

David

[Additional information]

As example, on columns of my header will have the following encoding change:

  • in PHP source code (UTF-8 file, English computer): 商品序号 (meaning: SKU, item code)
  • in the final CSV file (ANSI file, English computer): ÉÌÆ·ÐòºÅ
  • in the final CSV file (ANSI file, Chinese computer): 商品序号

[/Additional information]

david_b
  • 11
  • 1
  • 3
  • `iconv("UTF-8","GB2312",$csv)` should work perfectly fine. I don't know why you're trying with ASCII. What you want is to convert UTF-8 data to GB2312. – deceze Jul 27 '12 at 10:40
  • Yes, I can see a difference with this function. Actually, the header line in Chinese (it'll give the columns names for this CSV file) is hardcoded. `$csv = "商家编码,属性序号\n"; //header line then $csv .= "123456,100\n" //each data line will write something like this` Then the problem seems to come from the encoding of this specific line. The thing is my file is in UTF-8 so it should be OK... I can read the Chinese properly in my source code. – david_b Jul 30 '12 at 04:09
  • Also, my file keeps saving in UTF-8 format. – david_b Jul 30 '12 at 06:29

2 Answers2

1

string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )

Note the second parameter is to encoding. So it should be

$csv = mb_convert_encoding($csv, "GB2312", "UTF-8");
xdazz
  • 158,678
  • 38
  • 247
  • 274
1

The HTTP headers you send only specifies to the client what charset you are replying in - it does not convert the content for you. So if you specify charset=GB2312, but send utf8, you're merely lying. In any case, the charset attribute doesn't make any sense here, as the content is transferred as binary anyway.

What you need to do, is to convert the content before sending it. Iconv or mbstring are the proper tools for this. Start by making sure you know what charset your data comes in. Presumably it's loaded from somewhere (like a database). So considering you're a bit lost, there's a good chance that it isn't what you think it is. E.g. it might well be iso-8859-1 and not utf-8.

Once you're sure it's indeed utf-8, use iconv as you have already tried:

$csv = iconv('UTF-8', 'GB2312', $csv);

Assuming that $csv is a string, containing the csv file.

troelskn
  • 115,121
  • 27
  • 131
  • 155
  • Actually, the header line in Chinese (it'll give the columns names for this CSV file) is hardcoded. `$csv = "商家编码,属性序号\n"; //header line` then `$csv .= "123456,100\n" //each data line will write something like this` – david_b Jul 30 '12 at 01:26
  • In that case, the encoding will be the same as the file encoding. So you have to make certain you're saving as utf-8. And that should be utf-8 *without* BOM, not *with* BOM. – troelskn Jul 30 '12 at 08:26
  • OK, then as I don't want to have UTF-8 as output but the Chinese format, the only solution are : 1. code the file on a Chinese computer (or turn my Windows encoding settings while coding this file) 2. find a software that would use the right encoding already (I have no idea and Notepad++ doesn't seem to make it) --- 3. find a way to change the encoding of the output file even if my file is in UTF-8 (that's the goal of this post) – david_b Jul 31 '12 at 05:37
  • And 3 seems more and more impossible. The only problem with 1 is that the Chinese characters look ugly an are not editable on my computer. Anyway, if you have any other ideas to achieve 3, I'd be more than happy. – david_b Jul 31 '12 at 05:39
  • You are correct, #3 is what you're going for here. Assuming that your file is stored as UTF-8 (Without BOM), using iconv as per my response should convert the string to GB2312. You can then send it to the user (echo it out) and the recipient should get GB2312 text. – troelskn Jul 31 '12 at 09:21
  • The problem might be how you're testing it - I'm not sure, but maybe your browser is converting to UTF-8 on its own account? Could you try to download the file using something like curl and then inspect the file? – troelskn Jul 31 '12 at 09:22