0

I want to parse my CSV file into a JSON file. I have parsed it but it's not getting Japanese characters.

I am using Papa Parser to parse CSV to JSON.

Here is my Code:-

Papa.parse("http://localhost:3000/readdata.csv", {
    download: true,
    header: true, 
    worker: true,
    encoding: 'Shift-JIS',
      console.log(row);
    },
    complete: function() {
      console.log("All done!");
    }
});

answer:-

{��s����: "0", ��s��(��): "�����", ��s��(����): "���{��s", �x�X����: "79", �x�X��(��): "���-", …}

parsing works but not working encoding.

Is there any other solution to parse Japanese CSV (huge file) to JSON?

TomPlum
  • 135
  • 1
  • 14
Parth Raval
  • 4,097
  • 3
  • 23
  • 36
  • Is the encoding of the file correct – it is Shift JIS? (If the file is UTF-8, then it needs to be read that way, for example.) – Zoe Edwards Mar 21 '18 at 13:34
  • @ThomasEdwards yes it worked fine but getting some error in response {��s����: "0", ��s��(��): "�����", ��s��(����): "���{��s", �x�X����: "79", �x�X��(��): "���-", …} – Parth Raval Mar 21 '18 at 13:34
  • Can you give an example line from the CSV, and what `row` prints in the log? – Zoe Edwards Mar 21 '18 at 13:35
  • Yes I have updated my question sir please check question sir... if there is another solution then please let me know – Parth Raval Mar 21 '18 at 13:35
  • 1
    We need to see the CSV too to understand. If possible could you upload the CSV somewhere so we can download it? You only need to include a few lines of it. – Zoe Edwards Mar 21 '18 at 13:36
  • @ThomasEdwards wait sir... i will send you in few minutes – Parth Raval Mar 21 '18 at 13:38
  • http://mayursarang.com/PD/readdata.csv you can find data in csv here – Parth Raval Mar 21 '18 at 13:42
  • 1
    That file appeared to have been encoded in Windows 1252, not Shift JIS. If I do a lookup I get `charset=unknown-8bit`. Check how that file is being saved. – Zoe Edwards Mar 21 '18 at 13:45
  • so you mean to say that need to change encoding:'Windows 1252'..will solve my problem? – Parth Raval Mar 21 '18 at 13:50
  • 1
    That CSV either has no set encoding, so it’s defaulting to Western default for me, or it is set to that. If you’ve exported it from Excel, that’s probably what has happened. You could try exporting as UTF-8 for safety. If I convert `‹âsº°ÄÞ` using `windows-1252/latin1` into bytes, I get `8B E2 8D 73 BA B0 C4 DE`, which when converted to shift jis becomes `銀行コード` – which is ‘bank code’? – Zoe Edwards Mar 21 '18 at 13:52
  • @ThomasEdwards, No Sir If i upload file then it works perfectly...but when i am trying remote file..it is not working – Parth Raval Mar 21 '18 at 13:55
  • Is there any other solution for remote file parse ?? – Parth Raval Mar 21 '18 at 14:00
  • 2
    You need to be sure that it’s sending it to you correctly, it is likely the server causing the problem, not Papa. If you do `curl -i http://localhost:3000/readdata.csv` in the command line, what is the `Content-Type`? – Zoe Edwards Mar 21 '18 at 14:05
  • 1
    Maybe fetching the file via `XMLHttpRequest` and forcing the encoding is the best way. I tried setting the `Content-Type` to use Shift-JIS, but I'd get garbled text anyway. – sneep Mar 21 '18 at 17:22
  • @ThomasEdwards Thanks Sir I really appreciate your efforts Sir thanx :-)... and for Content-Type i got "text/csv" – Parth Raval Mar 22 '18 at 05:27
  • 1
    No encoding type after it? – Zoe Edwards Mar 22 '18 at 09:14

1 Answers1

5

I didn't really modify the relevant parts of your code, but seems to work for me. Firefox 58 here.

<html>
<head>
    <script src="papaparse.js"></script>
</head>
<body>
    <script>
    function openFile(event) {
        var input = event.target;
        Papa.parse(input.files[0], {
            download: true,
            header: true, 
            worker: true,
            encoding: 'Shift-JIS',
            complete: function(results) {
                console.log("All done!", results.data);
            }
        });
    }
    </script>
    <input type='file' onchange='openFile(event)'><br>
</body>
</html>

Unfortunately, this didn't work for me when I retrieved the file from a URL, even if I set the web servers headers to:

Content-Type: text/plain; charset=shift_jis

or

Content-Type: text/plain; charset=shift-jis

Update: Actually, this appears to work just fine. You may run into problems if you've got an old version in the browser cache however.

Here's a demo: https://blog.qiqitori.com/stackexchange/papaparse/papaparse-sjis-from-url.html

$ curl -I https://blog.qiqitori.com/stackexchange/papaparse/readdata-charset-sjis.csv
HTTP/1.1 200 OK
Date: Thu, 22 Mar 2018 05:23:49 GMT
Server: Apache/2.4.25 (Debian)
Last-Modified: Wed, 21 Mar 2018 15:48:17 GMT
ETag: "15a-567ee1ea9847f"
Accept-Ranges: bytes
Content-Length: 346
Vary: Accept-Encoding
Content-Type: text/plain; charset=shift_jis

If you cannot change your server settings, here's a work-around that will allow you to do this without changing the server settings at all: I suggest using XMLHttpRequest to load the CSV into a variable, and forcing the encoding to Shift-JIS.

function load(url, callback) {
    var Xhr = new XMLHttpRequest();
    Xhr.onreadystatechange = function () {
        if (Xhr.readyState === 4 && Xhr.status === 200)
            callback(Xhr.responseText);
    };
    Xhr.open("GET", url, true);
    Xhr.overrideMimeType('text/plain; charset=Shift_JIS');
    Xhr.send();
}

load("http://.../readdata.csv", function (contents) {
        Papa.parse(contents, {
//          download: true,
            header: true, 
            worker: true,
//          encoding: 'Shift-JIS',
            complete: function(results) {
                console.log("All done!", results.data);
            }
        });
    });
sneep
  • 1,828
  • 14
  • 19
  • 1
    Sorry, directly linking to papaparse.com's `papaparse.js` didn't appear to work, so I removed the link. – sneep Mar 21 '18 at 14:26
  • thank you sneep san ...You are very genius san..This answer helped me a lot thanx :-) – Parth Raval Mar 22 '18 at 05:26
  • 1
    I actually updated it a second ago. Turns out that it does work when you modify the server's headers, but you may run into caching problems. Papa Parse currently doesn't appear to allow you to override the MIME type, but maybe it would make sense to add this feature sometime in the future. – sneep Mar 22 '18 at 05:29
  • Ok so can you please tell me, what is solution for that? – Parth Raval Mar 22 '18 at 05:33
  • 1
    I'd suggest just using the second code snippet I pasted. The one with the `load()` function. Sorry if I'm being confusing. – sneep Mar 22 '18 at 06:56
  • this helped me a lot... doumo arigatou gozaimasu :-) – Parth Raval Mar 22 '18 at 07:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167316/discussion-between-parth-raval-and-sneep). – Parth Raval Mar 22 '18 at 07:16