0

Friends, I am preparing a TSV file from excel file, containing Chinese (special) characters as follows - The Seonjeongneung ... Jeonghyeon (貞顯王后, 1462–1530) .....

I have tried using perl CPAN's Spreadsheet::ParseExcel and Spreadsheet::ParseExcel::FmtJapan. But no success. These characters are appearing as ?? in the TSV file, when opened in VIM.

I also tried " binmode STDOUT, ':utf8'; " and " binmode STDOUT, ':encoding(cp932)'; "

Please help me out, finding a way to extract information from Excel sheets and getting into TSV format.

PS : Excel allows direct save as TSV, but the output was screwed up there as well

Code4Fun
  • 125
  • 3
  • 13
  • If someone thinks that python/Java/php etc has better way to handle this situation. I can try out that as well. I need extract data into TSV file correctly, to start my actual project – Code4Fun Sep 26 '12 at 17:39
  • Recently i got problems with chinese utf8 on my perl work. There can hide many underlying problems, so google alitle about what lang can better fit your task, i dont see any reason to use exactly perl. – Galimov Albert Sep 26 '12 at 17:49
  • Dear, I am searching and searching .. on google since last 6-8 hrs, spent some time on python as well. But could not find way to read Chinese characters from excel file – Code4Fun Sep 26 '12 at 17:55
  • Lets think if **they** use some language, it has good support of their encoding. [Google trends](http://www.google.com/trends/?q=python,perl,java&ctab=0&geo=cn&geor=all&date=all&sort=0) – Galimov Albert Sep 26 '12 at 18:03

1 Answers1

0

I just exported your sample text perfectly from OpenOffice Calc, just by choosing the "Save as .csv" option and choosing UTF-8 as format. I'd be very surprised if Excel can't do the same. Have you considered the possibility that VIM / your console doesn't support Chinese characters correctly or that it's set to use a font that doesn't include Chinese characters? To check for this kind of error, open your .csv or .tsv file in your web browser. Web browsers will do anything to correctly display a file, including changing fonts as necessary.

If you want, send me the file you need to export and I'll check if there's anything weird about it. Could be one of the native Chinese encodings (gb or big5) instead of Unicode.

Sprachprofi
  • 1,229
  • 12
  • 24
  • Thanks for the help, Sprachprofi. I have managed to store the data contained in excel file into tsv file with accented characters using perl's ParseExcel module and Unicode formatter. – Code4Fun Sep 27 '12 at 13:53