I am trying to fetch a page from the internet then save it into a HTML file. The page has this in the header:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" >
<head>
<meta http-equiv="Content-Type" content="text/html;charset=Shift_JIS" />
</head>
No matter what I try, the saved HTML page looks horrible and I just can't get it to save the Japanese characters properly.
I am using node-fetch, fs.writeFile and a module named jconv. I have tried all combinations but nothing works. Right now, the code is supposed to convert from SJIS to UTF-8, then fs
should write the file with UTF-8 encoding.
fetch(link).
then((res) => {
if (res.ok) {
return res.text();
}
console.log("Invalid data");
}).
then((body) => {
// this is supposed to convert from SJIS to UTF-8
var buf = jconv.convert(body, 'SJIS', 'UTF-8');
// save file
fs.writeFile(path, buf, 'UTF-8', (err) => {
if (!err) {
console.log('Saved');
}
});
});
I have tried other encodings but the final HTML document still does not show the proper special characters, just like on the online page from which is taken. A page that I am testing right now is this