2

I have a Japanese client and have generated a large flat file (1.2 million rows) of data to send to them.

The file is UTF-8 encoded, which supports storing and displaying all the Japanese characters. The client wishes to receive this file in a shiftJIS encoded format that's designed for Japanese characters.

  1. From the wikipedia page I can get the conversion logic
  2. I see online converters such as motobit that let you convert encodings.

My issue is that my file is quite large and I will have to do this for several hundred more files repetitively. The copy-paste field on the online converter tool won't scale to that size and isn't quick enough.

Does anyone know of a free desktop application or perhaps even a ruby library that I could use to convert encodings? Or any other suggestions?

Thanks!

user2490003
  • 10,706
  • 17
  • 79
  • 155
  • 1
    What system? On most *NIX systems iconv is installed by default: `$ iconv -f UTF-8 -t SJIS file.csv > file.sjis.csv` – deceze May 10 '14 at 16:19

2 Answers2

1

No need for any tool or utility, just use the gedit to convert your files. Follows the steps mentioned below:

  1. Open your file in gedit which you wish to convert, whatever format it may be in.

  2. Copy all the contents of the file and paste in a new gedit document.

  3. Now, save the file. In the save dialog thus opened select the character encoding as SHIFT_JIS before saving your file, attached is the screenshot below for this: enter image description here Change the line ending if you want to. If you do not see the SHIFT_JIS in the select options, then click on the Add or Remove button which is present just below it.

  4. In the dialog thus opened, select the SHIFT_JIS from the available encodings in the left column and then click on the Add button. Once added to the encoding menu, select it and save it. enter image description here

Vipin Verma
  • 5,330
  • 11
  • 50
  • 92
  • 2
    Downvote: Using a GUI tool where a command-line tool is available is *not* an improvement, and likely to break (or just suck immensely) on large files where a simple command-line tool will process arbitrarily large files a line at a time. – tripleee Mar 31 '16 at 08:15
0

I guess what you want might be the nkf, Network Kanji Filter.

You can convert a file from utf-8 into shift-jis like this:

% nkf -s file-utf8.txt > file-sjis.txt

manual page:
http://linuxcommand.org/man_pages/nkf1.html

wikipedia:
http://en.wikipedia.org/wiki/Network_Kanji_Filter

You can install nkf like this:

% sudo yum install nkf 
% sudo port install nkf
% brew install nkf   

Hope this helps.

naota
  • 4,695
  • 1
  • 18
  • 21
  • There should be no need to install anything, as `recode` or `iconv` should already be installed on any reasonably post-Columbian U*x system. – tripleee Mar 31 '16 at 08:19
  • I am trying to convert a file in utf-8 to shift-JIS. But it doesn't work as expected. The resulting file has weird characters in it, instead of Japanense ones. All the JP characters are garbled - http://imgur.com/sbibRAT – Vipin Verma Apr 02 '16 at 09:24
  • converted a file using `nkf -s prt1shift.txt > klklklklklk.txt`, and instead of SHIFT_JIS it got converted into Western(ISO-8859-15) with all the JP characters converted to boxes – Vipin Verma Apr 02 '16 at 10:30