4

I'm trying to grab some text from a Portuguese site that is encoded in ISO-8859-1 according to the meta tag. I'm using NodeJS and the request-promise package. What I get back, for example, is

Alg�s

I believe I need to convert that to ISO-8859-1 in NodeJS/Javascript. I have tried decodeURIComponent, encodeURIComponent, unescape and escape. None of those worked. Some of those even made things worse for the string. Anyone have any idea how to solve this?

Thanks in advance.

pedroremedios
  • 763
  • 1
  • 11
  • 39
  • If the site is in ISO-8869-1, you need to convert the text _from_ that encoding and not _to_ it. For your target encoding, you should probably choose UTF-8. – alesc Mar 03 '18 at 11:56
  • That diamond with question mark is a place holder character from a charset conversion - so repairing that char is out-of-the question. However for browser charset=ISO-8859-1 you should convert with charset=Windows-1252 which is a superset that the browsers substitute for ISO-8859-1 (officially in HTML5). – Joop Eggen Mar 03 '18 at 11:57
  • I'm trying Iconv, Buffer, nothing is working. Should I not be setting some sort of encoding in the options when I call 'rp'? I'm also using the .text() function in jQuery... – pedroremedios Mar 03 '18 at 15:32
  • Maybe the response is not in iso-8859-1 encoding? – pedroremedios Mar 03 '18 at 16:00
  • Can you convert the Buffer to hex and post that? – Josh Lee Mar 03 '18 at 16:14
  • I got it working because the problem was cheerio or maybe I was using cheerio wrong. I used the solution posted here: https://stackoverflow.com/questions/12326688/node-js-scrape-encoding – pedroremedios Mar 03 '18 at 17:01

1 Answers1

1

If you are requesting the data via fetch, you can try to convert the data into an arrayBuffer:

let result = [];
fetch('isoEncodedApi/data.json').then(response => {
  response.arrayBuffer().then(arrayBuffer => {
    const textDecoder = new TextDecoder('iso-8859-1');
    const decodedResult = textDecoder.decode(arrayBuffer);
    result = JSON.parse(decodedResult);
  });
});

maxlmuc
  • 21
  • 1