0

I am trying to fetch a page from the internet then save it into a HTML file. The page has this in the header:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" >
<head>
<meta http-equiv="Content-Type" content="text/html;charset=Shift_JIS" />
</head>

No matter what I try, the saved HTML page looks horrible and I just can't get it to save the Japanese characters properly.

I am using node-fetch, fs.writeFile and a module named jconv. I have tried all combinations but nothing works. Right now, the code is supposed to convert from SJIS to UTF-8, then fs should write the file with UTF-8 encoding.

fetch(link).
then((res) => {
    if (res.ok) {
        return res.text();
    }
    console.log("Invalid data");
}).
then((body) => {
    // this is supposed to convert from SJIS to UTF-8
    var buf = jconv.convert(body, 'SJIS', 'UTF-8');

    // save file
    fs.writeFile(path, buf, 'UTF-8', (err) => {
        if (!err) {
            console.log('Saved');
        }
    });
});

I have tried other encodings but the final HTML document still does not show the proper special characters, just like on the online page from which is taken. A page that I am testing right now is this

bMain
  • 324
  • 3
  • 11

1 Answers1

1

The line:

<meta http-equiv="Content-Type" content="text/html;charset=Shift_JIS" />

must also be modified to:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

to have the charset information in the header match the new encoding.

  • I don't have access to the source of that page. Must I change that in the node script, after I download the page, before encoding and saving? – bMain Apr 09 '19 at 23:47